LDBC Semantic Publishing Benchmark evolution. The Semantic Publishing Benchmark (SPB) needed to evolve in order to allow for retrieval of semantically relevant content based on rich metadata descriptions and diverse reference knowledge and demonstrate that triplestores offer simplified and more efficient querying proving it to be a challenging technology in the RDF and Graph arena. Read in the presentations all the changes.
The document discusses the key features and capabilities of search in SharePoint 2013, including personalized search results, continuous crawling, query rules to customize search results, managed metadata and refiners to improve relevance, and analytics to track search usage and improve recommendations. It provides details on result sources, query rules, display templates, and various analytics features to enhance the search experience.
Entity Search on Virtual Documents Created with Graph EmbeddingsSease
The document summarizes Anna Ruggero's presentation on entity search using graph embeddings. The presentation introduced an approach to entity search that creates virtual documents combining related entities based on their embeddings. Entities from DBPedia were represented as nodes in a graph and Node2Vec was used to generate embeddings. The embeddings were clustered to form documents, which were ranked using BM25. Systems that combined or fused the document rankings and entity rankings were evaluated on entity search tasks and found relevant entities that traditional methods did not.
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachAlessandro Benedetti
Every information retrieval practitioner ordinarily struggles with the task of evaluating how well a search engine is performing and to reproduce the performance achieved in a specific point in time.
Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
Additionally it is extremely important to track the evolution of the search system in time and to be able to reproduce and measure the same performance (through metrics of interest such as precison@k, recall, NDCG@k...).
The talk will describe the Rated Ranking Evaluator from a researcher and software engineer perspective.
RRE is an open source search quality evaluation tool, that can be used to produce a set of reports about the quality of a system, iteration after iteration and that could be integrated within a continuous integration infrastructure to monitor quality metrics after each release .
Focus of the talk will be to raise public awareness of the topic of search quality evaluation and reproducibility describing how RRE could help the industry.
The document discusses the new architecture for search in SharePoint 2013. It introduces the main components of the 2013 search architecture including the crawl component, content processing component, analytics processing component, index component, query processing component, and admin component. It provides an overview of the key changes and improvements in 2013 such as continuous crawl, redundancy of the admin component, and improved management using PowerShell. The document also covers upgrade considerations and limitations when migrating from SharePoint 2010 search.
Every team working on information retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(currently and historically). Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders. In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort. To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend.
COUNTER R4 to R5 - transition and comparison with JUSP - updatedJUSPSTATS
This document outlines a session on the transition from COUNTER R4 to R5 usage reporting standards and the new reports available in JUSP. It discusses JUSP transitioning their reports to R5, challenges libraries are facing understanding and working with R5 data, new report features in JUSP to help with the transition like comparing R4 and R5 metrics, and where to find training and support resources.
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
flexible and highly configurable for a technical user
immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
The document discusses the key features and capabilities of search in SharePoint 2013, including personalized search results, continuous crawling, query rules to customize search results, managed metadata and refiners to improve relevance, and analytics to track search usage and improve recommendations. It provides details on result sources, query rules, display templates, and various analytics features to enhance the search experience.
Entity Search on Virtual Documents Created with Graph EmbeddingsSease
The document summarizes Anna Ruggero's presentation on entity search using graph embeddings. The presentation introduced an approach to entity search that creates virtual documents combining related entities based on their embeddings. Entities from DBPedia were represented as nodes in a graph and Node2Vec was used to generate embeddings. The embeddings were clustered to form documents, which were ranked using BM25. Systems that combined or fused the document rankings and entity rankings were evaluated on entity search tasks and found relevant entities that traditional methods did not.
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachAlessandro Benedetti
Every information retrieval practitioner ordinarily struggles with the task of evaluating how well a search engine is performing and to reproduce the performance achieved in a specific point in time.
Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
Additionally it is extremely important to track the evolution of the search system in time and to be able to reproduce and measure the same performance (through metrics of interest such as precison@k, recall, NDCG@k...).
The talk will describe the Rated Ranking Evaluator from a researcher and software engineer perspective.
RRE is an open source search quality evaluation tool, that can be used to produce a set of reports about the quality of a system, iteration after iteration and that could be integrated within a continuous integration infrastructure to monitor quality metrics after each release .
Focus of the talk will be to raise public awareness of the topic of search quality evaluation and reproducibility describing how RRE could help the industry.
The document discusses the new architecture for search in SharePoint 2013. It introduces the main components of the 2013 search architecture including the crawl component, content processing component, analytics processing component, index component, query processing component, and admin component. It provides an overview of the key changes and improvements in 2013 such as continuous crawl, redundancy of the admin component, and improved management using PowerShell. The document also covers upgrade considerations and limitations when migrating from SharePoint 2010 search.
Every team working on information retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(currently and historically). Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders. In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort. To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend.
COUNTER R4 to R5 - transition and comparison with JUSP - updatedJUSPSTATS
This document outlines a session on the transition from COUNTER R4 to R5 usage reporting standards and the new reports available in JUSP. It discusses JUSP transitioning their reports to R5, challenges libraries are facing understanding and working with R5 data, new report features in JUSP to help with the transition like comparing R4 and R5 metrics, and where to find training and support resources.
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
Every team working on Information Retrieval software struggles with the task of evaluating how well their system performs in terms of search quality(at a specific point in time and historically).
Evaluating search quality is important both to understand and size the improvement or regression of your search application across the development cycles, and to communicate such progress to relevant stakeholders.
To satisfy these requirements an helpful tool must be:
flexible and highly configurable for a technical user
immediate, visual and concise for an optimal business utilization
In the industry, and especially in the open source community, the landscape is quite fragmented: such requirements are often achieved using ad-hoc partial solutions that each time require a considerable amount of development and customization effort.
To provide a standard, unified and approachable technology, we developed the Rated Ranking Evaluator (RRE), an open source tool for evaluating and measuring the search quality of a given search infrastructure. RRE is modular, compatible with multiple search technologies and easy to extend. It is composed by a core library and a set of modules and plugins that give it the flexibility to be integrated in automated evaluation processes and in continuous integrations flows.
This talk will introduce RRE, it will describe its latest developments and demonstrate how it can be integrated in a project to measure and assess the search quality of your search application.
The focus of the presentation will be on a live demo showing an example project with a set of initial relevancy issues that we will solve iteration after iteration: using RRE output feedbacks to gradually drive the improvement process until we reach an optimal balance between quality evaluation measures.
19 09-26 source reconfiguration and august release featuresDede Sabey
This document summarizes a meeting agenda and presentation on Pardot's August release features.
The agenda includes discussions on populating Pardot's source field, August release features in depth, learning and development, and Q&A. The presentation covers Pardot's new Handlebars merge language, automated resubscribe emails, Engagement Studio complex rules, and multiple tracker domains. It also provides guidance on setting up the new features and finding related help articles and resources.
The Geographic Support System Initiative aims to improve the Census Bureau's Master Address File and spatial data through increased funding and partnerships. It involves continuous updates to address and map data, with a focus on rural areas and Puerto Rico. Major players are the Census Bureau and its government and private sector partners. Challenges include the lack of national addressing standards and constraints on sharing address data due to Title 13 confidentiality rules. Ongoing work includes research, evaluation of current data and standards, and leveraging technology and partnerships to improve data quality and exchange.
The document provides an overview of the architecture for a developer analytics solution. It includes:
1. An overview of the key components and data flows, including extracting and aggregating data from various sources and loading it into data warehouses and cubes.
2. A demo of the developer portal and reports.
3. Details on the front-end architecture with multiple active-active front-end servers and an active-passive backend architecture with data warehouses and cubes.
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
This document discusses property graphs and how they are represented and queried using Morpheus, a graph query engine for Apache Spark.
Morpheus allows querying property graphs using Cypher and represents property graphs using DataFrames, with node and relationship data stored in tables. It integrates with various data sources and supports federated queries across multiple property graphs. The document provides examples of loading property graph data from sources like JSON, SQL databases and Neo4j, creating graph projections, running analytical queries, and recommending businesses based on graph algorithms.
A two day training session for colleagues at Aimia, to introduce them to R. Topics covered included basics of R, I/O with R, data analysis and manipulation, and visualisation.
Search Quality Evaluation: a Developer PerspectiveAndrea Gazzarini
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...Databricks
This document discusses how property graphs and Cypher queries will be brought to Apache Spark through the Spark Graph project. It provides an overview of property graphs and Cypher, the graph query language. It then demonstrates how a Cypher query would be executed in Spark Graph, including parsing and translating the query, optimizing the logical and physical plans, and executing it using Spark SQL and DataFrames. This will allow graph queries and algorithms to be run efficiently at scale using Spark's distributed processing capabilities.
The document discusses the UniProt SPARQL endpoint, which provides access to 20 billion biomedical data triples. It notes challenges in hosting large life science databases and standards-based federated querying. The endpoint uses Apache load balancing across two nodes each with 64 CPU cores and 256GB RAM. Loading rates of 500,000 triples/second are achieved. Usage peaks at 35 million queries per month from an estimated 300-2000 real users. Very large queries involving counting all IRIs can take hours. Template-based compilation and key-value approaches are proposed to optimize challenging queries. Public monitoring is also discussed.
Lighthouse: Large-scale graph pattern matching on GiraphLDBC council
This document provides an overview of Lighthouse, a system for large-scale graph pattern matching on Giraph. It discusses the timeline of Lighthouse's development, its vertex-centric API, architecture based on BSP and Pregel, and execution algebra using operations like scan, select, project, and joins. Examples are provided of matching graph patterns in Cypher queries and finding the k-shortest paths between nodes while restricting the search space. The implementation uses two phases, first computing the routes for each top k shortest path by discovering precedent vertices, then building the full paths by traversing backwards. Preliminary results are mentioned but not detailed.
LDBC 6th TUC Meeting conclusions by Peter BonczIoan Toma
The document discusses the conclusions and future perspectives of the Linked Data Benchmark Council (LDBC). It summarizes that the EU project is on track, four benchmarks were developed including for social networks and graph analytics, and there has been growing participation from academic and industry groups. It outlines plans for LDBC to transition to an independent council and develop additional benchmarks in areas like graph programming, temporal graphs, and ontology reasoning.
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyIoan Toma
The document discusses an agenda for a meeting on benchmarking RDF and graph databases. It provides an overview of the LDBC benchmarking project including its objectives to create standardized benchmarks, spur industry cooperation, and push technological improvements. It outlines the working groups and task forces that will focus on specific types of benchmarks. It also discusses common issues to be addressed like suitable use cases, benchmark methodologies, and benchmark workloads for querying, updating, and integrating RDF and graph data. Open questions are raised about benchmark realism, benchmark rules, and technological convergence of RDF and graph databases.
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesIoan Toma
Semantic Automated Discovery and Integration
A design-pattern for “native” Linked-Data
Semantic Web Services by Mark D. Wilkinson (Universidad Politécnica de Madrid)
The LDBC benchmark council develops industry-strength benchmarks for graph and RDF data management systems. It focuses on benchmarks for transactional updates, business intelligence queries, graph analytics, and complex RDF workloads. The benchmarks are designed based on choke points - difficulties in the workload that stimulate technical progress. Examples of choke points in TPC-H include dependent group by keys and sparse joins. A key choke point for graph benchmarks is structural correlation, where the structure of the graph depends on node attribute values. The LDBC benchmark includes a generator that produces synthetic graphs with correlated properties and edges to test how systems handle this challenge.
Social Network Benchmark Interactive WorkloadIoan Toma
The document summarizes the LDBC Social Network Benchmark (SNB) for interactive workloads on social network systems. It describes:
- The SNB data generator that produces synthetic social network data with characteristics similar to Facebook, including persons, groups, posts, and relationships. It scales to very large datasets.
- The types of queries in the benchmark - complex reads, short reads, and updates - that mimic real user interactions. Complex reads target specific system bottlenecks.
- The workload driver that schedules queries and updates over time based on the synthetic data, aiming to balance read and write loads and mimic user browsing behavior. It measures latency and throughput.
- Validation and execution rules to ensure fair
This document outlines the process for auditing an implementation of the LDBC Social Network Benchmark (SNB). It describes downloading the necessary driver and data generator software. The main metrics for benchmarking are throughput at scale and query latency constraints during crash recovery testing. It provides guidelines for implementing the driver, loading data, running queries for 2 hours, and verifying crash recovery. Audit results will report throughput, load time, recovery time and system configuration. Rules prohibit hinted query plans, precomputed indexes, and require serializable consistency.
The document discusses various benchmarks that are commonly used to evaluate Semantic Web repositories and their performance handling large amounts of RDF data. Some of the major benchmarks mentioned include the Lehigh University Benchmark (LUBM), Berlin SPARQL Benchmark (BSBM), SP2Bench, Social Network Intelligence Benchmark (SIB), and DBPedia SPARQL Benchmark. The document also provides an overview of different benchmark components and links to resources with performance results from various RDF stores and systems.
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
slides from the S4 webinar "On-Demand RDF Graph Databases in the Cloud"
RDF database-as-a-service running on the Self-Service Semantic Suite (S4) platform: http://s4.ontotext.com
video recording of the talk is available at http://info.ontotext.com/on-demand-rdf-graph-database
The Characteristics of a RESTful Semantic Web and Why They Are ImportantChimezie Ogbuji
1. The document discusses characteristics of a RESTful Semantic Web and why they are important. It proposes that graph URIs in an RDF dataset identify the meaning of their associated graph.
2. It suggests a model where REST components can interact with named graphs intuitively via their URIs, retrieving and updating representations of the graphs' meanings.
3. It acknowledges some exceptions, like URIs that are names rather than locations, and proposes embedding such URIs in resolvable parent URIs to enable REST interaction.
Selecting the right database type for your knowledge management needs.Synaptica, LLC
This presentation looks at relational vs. graph databases and their advantages and disadvantages in storing semantic data for taxonomies and ontologies.
LOD2 plenary meeting in Paris: presentation of WP2: State of Play (Storing and Querying Very Large Knowledge Bases) by Peter Boncz (CWI) and Orri Erling (OpenLink Software)
19 09-26 source reconfiguration and august release featuresDede Sabey
This document summarizes a meeting agenda and presentation on Pardot's August release features.
The agenda includes discussions on populating Pardot's source field, August release features in depth, learning and development, and Q&A. The presentation covers Pardot's new Handlebars merge language, automated resubscribe emails, Engagement Studio complex rules, and multiple tracker domains. It also provides guidance on setting up the new features and finding related help articles and resources.
The Geographic Support System Initiative aims to improve the Census Bureau's Master Address File and spatial data through increased funding and partnerships. It involves continuous updates to address and map data, with a focus on rural areas and Puerto Rico. Major players are the Census Bureau and its government and private sector partners. Challenges include the lack of national addressing standards and constraints on sharing address data due to Title 13 confidentiality rules. Ongoing work includes research, evaluation of current data and standards, and leveraging technology and partnerships to improve data quality and exchange.
The document provides an overview of the architecture for a developer analytics solution. It includes:
1. An overview of the key components and data flows, including extracting and aggregating data from various sources and loading it into data warehouses and cubes.
2. A demo of the developer portal and reports.
3. Details on the front-end architecture with multiple active-active front-end servers and an active-passive backend architecture with data warehouses and cubes.
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
This document discusses property graphs and how they are represented and queried using Morpheus, a graph query engine for Apache Spark.
Morpheus allows querying property graphs using Cypher and represents property graphs using DataFrames, with node and relationship data stored in tables. It integrates with various data sources and supports federated queries across multiple property graphs. The document provides examples of loading property graph data from sources like JSON, SQL databases and Neo4j, creating graph projections, running analytical queries, and recommending businesses based on graph algorithms.
A two day training session for colleagues at Aimia, to introduce them to R. Topics covered included basics of R, I/O with R, data analysis and manipulation, and visualisation.
Search Quality Evaluation: a Developer PerspectiveAndrea Gazzarini
Search quality evaluation is an ever-green topic every search engineer ordinarily struggles with. Improving the correctness and effectiveness of a search system requires a set of tools which help measuring the direction where the system is going.
The slides will focus on how a search quality evaluation tool can be seen under a practical developer perspective, how it could be used for producing a deliverable artifact and how it could be integrated within a continuous integration infrastructure.
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...Databricks
This document discusses how property graphs and Cypher queries will be brought to Apache Spark through the Spark Graph project. It provides an overview of property graphs and Cypher, the graph query language. It then demonstrates how a Cypher query would be executed in Spark Graph, including parsing and translating the query, optimizing the logical and physical plans, and executing it using Spark SQL and DataFrames. This will allow graph queries and algorithms to be run efficiently at scale using Spark's distributed processing capabilities.
The document discusses the UniProt SPARQL endpoint, which provides access to 20 billion biomedical data triples. It notes challenges in hosting large life science databases and standards-based federated querying. The endpoint uses Apache load balancing across two nodes each with 64 CPU cores and 256GB RAM. Loading rates of 500,000 triples/second are achieved. Usage peaks at 35 million queries per month from an estimated 300-2000 real users. Very large queries involving counting all IRIs can take hours. Template-based compilation and key-value approaches are proposed to optimize challenging queries. Public monitoring is also discussed.
Lighthouse: Large-scale graph pattern matching on GiraphLDBC council
This document provides an overview of Lighthouse, a system for large-scale graph pattern matching on Giraph. It discusses the timeline of Lighthouse's development, its vertex-centric API, architecture based on BSP and Pregel, and execution algebra using operations like scan, select, project, and joins. Examples are provided of matching graph patterns in Cypher queries and finding the k-shortest paths between nodes while restricting the search space. The implementation uses two phases, first computing the routes for each top k shortest path by discovering precedent vertices, then building the full paths by traversing backwards. Preliminary results are mentioned but not detailed.
LDBC 6th TUC Meeting conclusions by Peter BonczIoan Toma
The document discusses the conclusions and future perspectives of the Linked Data Benchmark Council (LDBC). It summarizes that the EU project is on track, four benchmarks were developed including for social networks and graph analytics, and there has been growing participation from academic and industry groups. It outlines plans for LDBC to transition to an independent council and develop additional benchmarks in areas like graph programming, temporal graphs, and ontology reasoning.
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyIoan Toma
The document discusses an agenda for a meeting on benchmarking RDF and graph databases. It provides an overview of the LDBC benchmarking project including its objectives to create standardized benchmarks, spur industry cooperation, and push technological improvements. It outlines the working groups and task forces that will focus on specific types of benchmarks. It also discusses common issues to be addressed like suitable use cases, benchmark methodologies, and benchmark workloads for querying, updating, and integrating RDF and graph data. Open questions are raised about benchmark realism, benchmark rules, and technological convergence of RDF and graph databases.
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesIoan Toma
Semantic Automated Discovery and Integration
A design-pattern for “native” Linked-Data
Semantic Web Services by Mark D. Wilkinson (Universidad Politécnica de Madrid)
The LDBC benchmark council develops industry-strength benchmarks for graph and RDF data management systems. It focuses on benchmarks for transactional updates, business intelligence queries, graph analytics, and complex RDF workloads. The benchmarks are designed based on choke points - difficulties in the workload that stimulate technical progress. Examples of choke points in TPC-H include dependent group by keys and sparse joins. A key choke point for graph benchmarks is structural correlation, where the structure of the graph depends on node attribute values. The LDBC benchmark includes a generator that produces synthetic graphs with correlated properties and edges to test how systems handle this challenge.
Social Network Benchmark Interactive WorkloadIoan Toma
The document summarizes the LDBC Social Network Benchmark (SNB) for interactive workloads on social network systems. It describes:
- The SNB data generator that produces synthetic social network data with characteristics similar to Facebook, including persons, groups, posts, and relationships. It scales to very large datasets.
- The types of queries in the benchmark - complex reads, short reads, and updates - that mimic real user interactions. Complex reads target specific system bottlenecks.
- The workload driver that schedules queries and updates over time based on the synthetic data, aiming to balance read and write loads and mimic user browsing behavior. It measures latency and throughput.
- Validation and execution rules to ensure fair
This document outlines the process for auditing an implementation of the LDBC Social Network Benchmark (SNB). It describes downloading the necessary driver and data generator software. The main metrics for benchmarking are throughput at scale and query latency constraints during crash recovery testing. It provides guidelines for implementing the driver, loading data, running queries for 2 hours, and verifying crash recovery. Audit results will report throughput, load time, recovery time and system configuration. Rules prohibit hinted query plans, precomputed indexes, and require serializable consistency.
The document discusses various benchmarks that are commonly used to evaluate Semantic Web repositories and their performance handling large amounts of RDF data. Some of the major benchmarks mentioned include the Lehigh University Benchmark (LUBM), Berlin SPARQL Benchmark (BSBM), SP2Bench, Social Network Intelligence Benchmark (SIB), and DBPedia SPARQL Benchmark. The document also provides an overview of different benchmark components and links to resources with performance results from various RDF stores and systems.
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
slides from the S4 webinar "On-Demand RDF Graph Databases in the Cloud"
RDF database-as-a-service running on the Self-Service Semantic Suite (S4) platform: http://s4.ontotext.com
video recording of the talk is available at http://info.ontotext.com/on-demand-rdf-graph-database
The Characteristics of a RESTful Semantic Web and Why They Are ImportantChimezie Ogbuji
1. The document discusses characteristics of a RESTful Semantic Web and why they are important. It proposes that graph URIs in an RDF dataset identify the meaning of their associated graph.
2. It suggests a model where REST components can interact with named graphs intuitively via their URIs, retrieving and updating representations of the graphs' meanings.
3. It acknowledges some exceptions, like URIs that are names rather than locations, and proposes embedding such URIs in resolvable parent URIs to enable REST interaction.
Selecting the right database type for your knowledge management needs.Synaptica, LLC
This presentation looks at relational vs. graph databases and their advantages and disadvantages in storing semantic data for taxonomies and ontologies.
LOD2 plenary meeting in Paris: presentation of WP2: State of Play (Storing and Querying Very Large Knowledge Bases) by Peter Boncz (CWI) and Orri Erling (OpenLink Software)
Practical approaches to entification in library bibliographic dataTerry Reese
This document discusses practical approaches to linking library bibliographic data to external identifiers. It describes BIBFRAME as the new model for linked data and challenges around reconciling legacy data and different model flavors. Tools for experimenting are discussed, including MarcEdit's linked data tools that allow embedding URIs directly in MARC records. The document focuses on MarcEdit's MARCNext toolset and its features for testing BIBFRAME transformations and viewing linked data relationships. It addresses challenges around the readiness of linking services and local implementation of identifiers. Source code for the BIBFRAME testing plugin is also provided.
RDF generator that produces Linked Data that bear similar characteristics with real datasets,
presented at the 1st International Workshop on Benchmarking Linked Data (BLINK).
(HOBBIT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
overview of the RDF graph database-as-a-service (GraphDB based) on the Self-Service Semantic Suite (S4)
http://s4.ontotext.com
presentation for the AKSW Group of the University of Leipzig
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
Data Processing over very Large Relational Databaseskvaderlipa
Final presentation of my dissertation thesis focused on orientation, analyzing and finding information in large or unknown relational databases and data visualisation
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNextTerry Reese
MarcEdit and MARCNext provide linked data tools for catalogers to experiment with BIBFRAME and linked data concepts. The MARCNext tools allow visualization of metadata in various BIBFRAME formats and embedding URIs. The Link Identifiers tool embeds URIs from sources like VIAF, ID.LOC.GOV, and FAST in authority fields. The SPARQL Browser allows querying SPARQL endpoints. The Zepheira training plugin transforms records for the LibHub initiative. However, many linking services are still evolving and local implementations vary, posing challenges for widespread adoption.
This document discusses various data warehousing concepts. It begins by explaining that fact tables can share dimension tables and that typically multiple dimension tables are associated with a single fact table. It then defines ROLAP, MOLAP, and DOLAP architectures for OLAP and discusses how data is stored in each. An MDDB is described as a multidimensional database that stores data in multidimensional arrays, whereas an RDBMS stores data in tables and columns. The differences between OLTP and OLAP systems are outlined. Transformations in ETL are explained as manipulating data from its source form into a simplified form for the data warehouse. Filter transformations are briefly described. Finally, supported default source types for Informatica Power
This document discusses the key role of search in SharePoint 2013 and new search-related features. It provides an overview of the search architecture and components in SharePoint 2013. It also demonstrates query rules and the content search web part. Cross-site publishing is overviewed as another search-dependent feature. The document emphasizes the importance of high availability for search and provides a sample PowerShell script for cloning an active search topology.
This document discusses the development of a resource history service at APNIC that provides access to historical registry data through an RDAP API and user interface. It aims to reconnect disconnected history from registry changes and increase transparency. The service exposes previous registry states through a RDAP extension and has a prototype UI for exploring changing data over time. Feedback is sought on the API and UI as the service moves from experimental to stable.
Cloud-based Linked Data Management for Self-service Application DevelopmentPeter Haase
Peter Haase and Michael Schmidt of fluid Operations AG presented on developing applications using linked open data. They discussed the increasing amount of linked open data available and challenges in building applications that integrate data from different sources and domains. Their Information Workbench platform aims to address these challenges by allowing users to discover, integrate, and customize applications using linked data in a no-code environment. Key components of the platform include virtualized integration of data sources and the vision of accessing linked data as a cloud-based data service.
This was presented by the MongoDB team at the Singapore VIP event on 24th Jan 2019.
The presentation covers-
What is MongoDB
Why MongoDB
MongoDB As a Service, Serverless Platform and Mobile
MongoDB Atlas: Database as a Service (Available on AWS, Azure and Google Cloud)
Usecases
Linked data and the future of librariesRegan Harper
The document discusses a presentation given by OCLC and LYRASIS on linked data and what it means for the future of libraries. It provides an overview of linked data concepts, including defining linked data as using the web to connect related data and lower barriers to linking data. It outlines some of the key principles of linked data, and discusses how linked data can benefit libraries by making data more reusable, efficient to maintain and discoverable. It also notes some of the challenges libraries may face in changing workflows and maintaining information provenance with linked data.
MODAClouds Decision Support System for Cloud Service SelectionIoan Toma
The document discusses MODAClouds' Decision Support System (DSS) for cloud service selection. Some key points:
- The DSS helps users select cloud services by considering multiple dimensions like cost, quality, risks, and technical/business constraints.
- It allows multiple stakeholders like architects, operators, managers to provide input on tangible/intangible assets and risks.
- The DSS performs risk analysis and generates requirements. It also considers issues around multi-cloud environments like interoperability and migration challenges.
- Other features include automatic data gathering from various sources, and progressive learning over time from user inputs and service selections.
MarkLogic is an enterprise NoSQL database platform that can be used for semantic search, data integration, and intelligent recommendation engines. It natively stores XML, JSON, and RDF triples alongside documents. Triples provide context to documents and enable semantic queries over data. MarkLogic also supports full-text and geospatial search, transactions, and flexible indexing and replication of data at scale.
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015Ioan Toma
The document discusses the LDBC Social Network Benchmark for evaluating database and graph processing systems. It describes the benchmark's social network data generator which produces realistic data following power law distributions and correlations. It also outlines the benchmark's three workloads: interactive, business intelligence, and graph analytics. The focus is on the interactive workload, which includes complex read queries, simple read queries, and concurrent updates. It aims to identify choke points and measure the acceleration factor a system can sustain for the query mix while meeting a maximum query latency. Parameter curation is used to select query parameters that produce stable performance. The parallel query driver respects dependencies between queries to evaluate a system's ability to handle the workload concurrently.
Lighthouse: Large-scale graph pattern matching on GiraphIoan Toma
This document provides an overview of Lighthouse, a system for large-scale graph pattern matching on Giraph. It discusses the timeline of Lighthouse's development, its vertex-centric API, architecture based on BSP and Pregel, and execution algebra using operations like scan, select, project, and joins. Examples are provided of matching graph patterns in Cypher queries and finding the k-shortest paths between nodes while restricting to safe edges. The implementation uses two phases, first computing the routes for each top k path and then traversing back to build the full paths. Preliminary results are mentioned but not detailed.
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)Ioan Toma
HP has a long history of innovation dating back to its founding in a Palo Alto garage in 1939. Some of its notable innovations include the first programmable calculator in 1968, the first pocket scientific calculator in 1972, launching the first inkjet printer in 1984, and being first to commercialize RISC technology in 1986. More recently, HP Labs has developed technologies like ePrint in 2010, 3D Photon technology in 2011, and Project Moonshot in 2013. Going forward, HP Labs is focusing its research on systems, networking, security, analytics, and printing to deliver the fastest and most efficient route from data to value.
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczIoan Toma
This document describes the LDBC Social Network Benchmark (SNB). The SNB was created to benchmark graph databases and RDF stores using a social network scenario. It includes a data generator that produces synthetic social network data with correlations between attributes. It also defines workloads of interactive, business intelligence, and graph analytics queries. Systems are evaluated based on metrics like query response times. The SNB aims to uncover performance choke points for different database systems and drive improvements in graph processing capabilities.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
2. • Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#2LDBC Semantic Publishing Benchmark Evolution Mar 2015
3. SPB needed to evolve in order to:
• Allow for retrieval of semantically relevant content
– Based on rich metadata descriptions and diverse reference knowledge
• Demonstrate that triplestores offer simplified and
more efficient querying
And this way to:
• Cover more advanced and pertinent usage of
triplestores
• Present a bigger challenge to the engines
Change is Needed
#3LDBC Semantic Publishing Benchmark Evolution Mar 2015
4. • Background and issues with SPB 1.0:
– “Reference data” = master data in Dynamic Semantic Publishing
• Essentially, taxonomies and entity datasets used to describe “creative works”
– SPB 1.0 has very small set of reference data
– Reference data is not interconnected – few relations between entities
– All descriptions of content assets (articles, pictures, etc.) refer to the
Reference Data, but not to one another
– Star-shaped graph with small reference dataset in the middle
• This way SPB is not using the full potential of graph databases
• Bigger and better connected Reference Data
– More reference data and more connections between the entities
• A better connected dataset
– Make cross references between the different pieces of content
Directions for Improvement
#4LDBC Semantic Publishing Benchmark Evolution Mar 2015
5. • Make use of inference
– In SPB 1.0 it is really trivial: couple of subclasses and sub-properties
– It would be relevant to have some transitive, symmetric and inverse
properties
– Such semantics can foster retrieval of relevant content
• More interesting and challenging query mixes
– This can go in plenty of different directions, but the most obvious one
is to evolve the queries so that they take benefit from changes above
– On the bigger picture, there should be a way to make some of the
queries nicer, i.e. simpler and cleaner
• Testing high-availability
– FT acceptance tests for HA cluster are great starting point
– Too ambitious for SPB 2.0
Directions for Improvement (2)
#5LDBC Semantic Publishing Benchmark Evolution Mar 2015
6. • Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#6LDBC Semantic Publishing Benchmark Evolution Mar 2015
7. • 22M explicit statements total in SPB v.2.0
– vs. 127k statements in SPB 1.0; BBC lists + tiny fractions of GeoNames
• Added reference data from DBPedia 2014
– Extracts alike: DESCRIBE ?e WHERE { ?e a dbp-ont:Company }
– Companies (85 000 entities)
– Events (50 000 entities)
– Persons (1M entities)
• Geonames data for Europe
– All European countries, w/o RUS, UKR, BLR; 650 000 locations total
– Some properties and locations types irrelevant to SPB excluded
• owl:sameAs links between DBpedia and Geonames
– 500k owl:sameAs mappings
Extended Reference Dataset
#7LDBC Semantic Publishing Benchmark Evolution Mar 2015
8. To
Company
To Person To Place /
gn:Feature
To Event
Company 40,797 26,675 218,636 18
Person 89,506 1,324,425 3,380,145 145,892
Event 5,114 154,207 140,579 35,442
Interconnected Reference Dataset
#8LDBC Semantic Publishing Benchmark Evolution
• Substantial volume of connections between entities
• Geonames comes with hierarchical relationship,
defining nesting of locations, gn:parentFeature
• DBPedia inter-entity relationships between entities*:
* numbers shown in table are approximate
Mar 2015
9. • GraphDB’s RDFRank feature is used to calculate a
rank for all the entities in the Reference Data
– RDFRank calculates a measure of importance for all URIs in an RDF
graph, based on Google’s PageRank algorithm
• These RDFRanks are calculated using GraphDB after
the entire Reference Data is loaded
– This way all sorts of relationships in the graph are considered during
calculations, including such that were logically inferred
• RDFRanks are inserted using predicate
<http://www.ldbcouncil.org/spb#hasRDFRank>
• These ranks are available as a part of the Ref. Data
– No need to get compute them again during loading
RDF-Rank for the Reference Data
#9LDBC Semantic Publishing Benchmark Evolution Mar 2015
10. • The Data Generator uses a set of “popular entities”
– Those are referred in 30% of the content-to-entity relations/tags
– This is one of the heuristics used by the data generator to produce
more realistic data distributions
– This is implemented in SPB 1.0, no change in SPB 2.0
• Popular entities are those with top 5% RDF Rank
– This is the new thing in SPB v.2.0
– Before that the popular entities were selected randomly
– This way, we get a more realistic dataset where those entities which
are more often used to tag content also have better connectivity in the
Reference Data
• In the future, RDF-ranks can be used for other
purposes also
RDF-Rank Used for Popular Entities
#10LDBC Semantic Publishing Benchmark Evolution Mar 2015
11. • Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#11LDBC Semantic Publishing Benchmark Evolution Mar 2015
12. • Generated three times more relationships between
Creative Works and Entities, than in SPB 1.0
– More recent use cases in publishing, adopt rich metadata descriptions
with more than 10 references to relevant entities and concepts
– In SPB 1.0 there are on average a bit less than 3 entity references per
creative work, based on distributions from an old BBC archive
– While developing SPB 2.0 we didn’t have access to substantial amount
of rich semantic metadata from publisher’s archive, to allow us derive
content-to-concept frequency distributions
– So, we decided to use the same distributions from BBC, but to triple
the concept references for each of them
– In SPB 2.0 there are on average about 8 content-to-concept references
– This results in about 25 statements/creative work description
– All other heuristics and specifics of the metadata generator were
preserved (popular entities, CW sequences alike storylines, etc.)
Changes in the Metadata
#12LDBC Semantic Publishing Benchmark Evolution Mar 2015
13. • Geonames URIs used for content-to-location
references
– As in SPB v.1.0, each creative work description refers (through
cwork:Mention property) to exactly one location
• These references are exploited in queries with geo-spatial constraints
– While most of the geo-spatial information in the Reference Data
comes from Geonames, for about 500 thousand locations there are
also DBPedia URIs, that come from the DBPedia-to-Geonames
owl:sameAs mappings
– Intentionally, DBPedia URIs are not used when metadata about
creative works is generated
• In contrast, the substitution parameters for locations in the corresponding queries
use only DBpedia locations
• This way the corresponding queries would return no results if proper SameAs
expansion doesn’t exist
Changes in the Metadata (2)
#13LDBC Semantic Publishing Benchmark Evolution Mar 2015
14. • As a result of changes in the size of the Reference
Data and the metadata size, there are differences in
the number of Creative Works included at the same
scale factor in SPB v.1.0 and in SPB v.2.0
Changes in the Metadata - Stats
#14LDBC Semantic Publishing Benchmark Evolution Mar 2015
SPB 1.0 SPB 2.0
Reference Data size (explicit statements) 170k 22M
Creative Work descr. size (explicit st./CW) 19 25
Metadata in 50M dataset (explicit statements) 50M 28M
Creative Works in 50M datasets (count) 2.6M 1.1M
Creative Work-to-Entity relationships in 50M 7M 9M
Metadata in 1B dataset (explicit statements) 1B 978M
Creative Works in 1B datasets (count) 53M 39M
Creative Work-to-Entity relationships in 1B 137M 313M
15. • Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#15LDBC Semantic Publishing Benchmark Evolution Mar 2015
16. The Rules Sets
LDBC Semantic Publishing Benchmark Evolution #16
• Complete inference in SPB 2.0 requires support for
the following primitives:
– RDFS: subPropertyOf, subClassOf
– OWL: TransitiveProperty, SymmetricProperty, sameAs
• Any triplestore with OWL 2 RL reasoning support will
be able to process SPB v2.0 correctly
– In fact, a much reduced rule-set is sufficient for complete reasoning in
as in OWL 2 RL there are host of primitives and rules that SPB 2.0 does
not make use of. Such examples are all onProperty restrictions, class
and property equivalence, property chains, etc.
– The popular (but not W3C standardized) OWL Horst profile is sufficient
for reasoning with SPB v 2.0
• OWL Horst refers to the pD* entailment defined by Herman Ter Horst in:
Completeness, decidability and complexity of entailment for RDF Schema and a
semantic extension involving the OWL vocabulary. J. Web Sem. 3(2-3): 79-115 (2005)
Mar 2015
17. • Modified versions of the BBC Ontologies
– As in SPB v.1.0, BBC Core ontology defines relationships between
Person – Organizations – Locations - Events
– Those were mapped to corresponding DBPedia and Geonames classes
and relationships
• bbccore:Thing is defined as super class of dbp-ont:Company, dbp-ont:Event,
foaf:Person, geonames-ontology:Feature (geographic feature)
– dbp-ont:parentCompany is defined to be owl:TransitiveProperty
– These extra definitions are added at the end of the BBC Core ontology
• Added a SPB-modified version of the Geonames
ontology
– Stripped out unnecessary pieces such onProperty restrictions that
impose cardinality constraints
New and Updated Ontologies
LDBC Semantic Publishing Benchmark Evolution #17Mar 2015
18. • Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#18LDBC Semantic Publishing Benchmark Evolution Mar 2015
19. • Transitive closure of location-nesting relationships
– geo-ont:parentFeature property is used in GeoNames to link each
location to larger locations that it is a part of
– geo-ont:parentFeature is owl:TransitiveProperty in the GN ontology
– This way if Munich has gn:parentFeature Bavaria, that on its turn has
gn:parentFeature Germany, than reasoning should make sure that
Germany also appears as gn:parentFeature of Munich
• Transitive closure of dbp-ont:ParentCompany
– Inference unveils indirect company control relationships
• owl:sameAs relations between Geonames features
and DBPedia URIs for the same locations:
– The standard semantics of owl:sameAs requires that each statement
asserted using the Geonames URIs should be inferable/retrievable also
with the mapped DBPedia URIs
Concrete Inference Patterns
LDBC Semantic Publishing Benchmark Evolution #19Mar 2015
20. • The inference patterns from SPB v.1.0 are still there:
– cwork:tag statements inferred from each of its sub-properties
cwork:about and cwork:mentions
– < ?cw rdf:type cwork:CreativeWork > statements inferred for
instances of each of its sub-classes
– These two are the only reasoning patterns that apply to the generated
content metadata; all the others apply only to the Reference Data
Concrete Inference Patterns (2)
LDBC Semantic Publishing Benchmark Evolution #20Mar 2015
21. • If brute-force materialization is used, the Reference
Data expands from 22M statements to about 100M
– If owl:sameAs expansion is not considered, materialization on top of
the Reference Data adds about 7M statements – most of those coming
from the transitive closure of geo-ont:parentFeature
– Several triplestores (e.g. ORACLE and GraphDB) that use forward-
chaining have specific mechanism, which allow them to handle
owl:sameAs reasoning in a sort of hybrid manner, without expanding
their indices. After materialization, such engines will have to deal with
29M statements of Reference Data, instead of 100M
• Brute-force materialization of the generated
metadata describing creative works would double it
– In SPB v.1.0 the expansion factor was 1.6, but now there are more
entity references and also owl:sameAs equivalents of the location URIs
Inference Statistics and Ratios
LDBC Semantic Publishing Benchmark Evolution #21Mar 2015
22. • Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#22LDBC Semantic Publishing Benchmark Evolution Mar 2015
23. Modified Queries in Basic Interactive
LDBC Semantic Publishing Benchmark Evolution #23
• SPB 2.0 changes only the Basic Interactive Query-Mix
– The other query mixes are mostly unchanged
• In most of the queries, cwork:about and
cwork:mention properties that link creative work to
an entity, have been replaced with their supper
cwork:tag
– This applies also to the Advanced Interactive and the Analytical use
cases
– For most of the queries, both types of relations are equally relevant
– Using the super-property make the query simpler; as compared to
other approaches to make a query that uses both (e.g. UNION)
Mar 2015
24. New Queries
LDBC Semantic Publishing Benchmark Evolution #24
• Basic Interactive query-mix now adds 2 new queries
exploring the interconnectedness in reference data
• Q10: Retrieve CWs that mention locations in the
same province (A.ADM1) as the specified one
– There is additional constraint on time interval (5 days)
• Q11: Retrieve the most recent CWs that are tagged
with entities, related to a specific popular entity
– Relations can be inbound and outbound; explicit or inferred
Mar 2015
25. Q10: News from the region
LDBC Semantic Publishing Benchmark Evolution #25
SELECT ?cw ?title ?dateModified {
<http://dbpedia.org/resource/Sofia> geo-ont:parentFeature ?province .
?province geo-ont:featureCode geo-ont:A.ADM1 .
{
?location geo-ont:parentFeature ?province .
} UNION {
BIND(?province as ?location) .
}
?cw a cwork:CreativeWork ;
cwork:tag ?location ;
cwork:title ?title ;
cwork:dateModified ?dateModified .
FILTER(?dateModified >=
"2011-05-14T00:00:00.000"^^<http://www.w3.org/2001/XMLSchema#dateTime>
&& ?dateModified <
"2011-05-19T23:59:59.999"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
}
LIMIT 100
Mar 2015
26. Q11: News on about related entities
LDBC Semantic Publishing Benchmark Evolution #26
SELECT DISTINCT ?cw ?title ?description ?dateModified ?primaryContent
{
{
<http://dbpedia.org/resource/Teresa_Fedor> ?p ?e .
}
UNION
{
?e ?p <http://dbpedia.org/resource/Teresa_Fedor> .
}
?e a core:Thing .
?cw cwork:tag ?e ;
cwork:title ?title ;
cwork:description ?description ;
cwork:dateModified ?dateModified ;
bbc:primaryContentOf ?primaryContent .
}
ORDER BY DESC(?dateModified)
LIMIT 100
Mar 2015
27. Trivial Validation Opportunities
LDBC Semantic Publishing Benchmark Evolution #27
• The validation mechanisms from SPB v.1.0 remain
unchanged
• In addition to this, the changes to the benchmark are
designed such a way that in case that specific
inference patterns are not supported by the engine,
some of the queries will return zero results
– If owl:sameAs inference is not supported Q10 will return 0 results
– If transitive properties are not supported, Q10 will return 0 results
• Q11 will return smaller number of results (higher ratio of zeros also)
– If rdfs:subProperty is not supported, Q2-Q10 will return 0 results
– If rdfs:subClass is not supported, Q1, Q2, Q6, Q8 will return no results
Mar 2015
28. • Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• The New Queries
• GraphDB Configuration Disclosure
• Future plans
Outline
#28LDBC Semantic Publishing Benchmark Evolution Mar 2015
29. 1. Reasoning performed through forward-chaining
with a custom ruleset
– This is the rdfs-optimized ruleset of GraphDB with added rules to
support transitive, inverse and symmetric properties
2. owl:sameAs optimization of GraphDB is beneficial
– This is the standard behavior of GraphDB; but it helps a lot
– Query-time tuning is used to disable expansion of results with
respect to owl:sameAs equivalent URIs
3. Geo-spatial index in Q6 of the basic interactive mix
4. Lucene Connector for full-text search in Q8
Note: Points #2-#4 above help query time, but slow
down updates in GraphDB. As materialization does also
GraphDB Configuration
LDBC Semantic Publishing Benchmark Evolution #29Mar 2015
30. • Experimental run using GraphDB-SE 6.1
• Using LDBC Scale Factor 1 - (22M reference data +
30M generated data)
• Hardware: 32G RAM, Intel i7-4770 CPU @ 3.40GHz,
single SSD Drive Samsung 845 DC
• Benchmark configuration: 8 reading / 2 writing
agents, 1 min warmup, 40 min benchmark
• Updates/sec : 13, Selects /sec : 44
GraphDB – First Results
LDBC Semantic Publishing Benchmark Evolution #30Mar 2015
31. • Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#31LDBC Semantic Publishing Benchmark Evolution Mar 2015
32. • Relationships between pieces of content
– At present Creative Works are related only to Entities, not to other
CWs
– These could be StoryLines or Collections of MM assets, e.g. an article
with few images related to it
• Better modelling of content-to-entity cardinalities
– Based on data from FT
• Enriched query sets and realistic query frequencies
– Based on query logs from FT
• Loading/updating big dataset in live instances
• Testing high-availability
– FT acceptance tests for High Availability cluster are great starting point
Future Plans
#32LDBC Semantic Publishing Benchmark Evolution Mar 2015
33. • Much bigger reference dataset: from 170k to 22M
– 7M statement from GeoNames about Europe
– 14M statements from DBpedia for Companies, Persons, Events
• Interconnected reference data: more than 5M links
– owl:sameAs links between DBPedia and Geonames
• Much more comprehensive usage of inference
– While still the simplest possible flavor of OWL is used
• rdfs:subClassOf, rdfs:subPorpertyOf, owl:TransitiveProperty, owl:sameAs
– Transitive closure over company control and geographic nesting
– Simpler queries through usage of super-property
• Retrieval of relevant content through links in the
reference data
Summary of SPB v.2.0 Changes
#33LDBC Semantic Publishing Benchmark Evolution Mar 2015