TripleWave: Spreading RDF Streams on the WebAndrea Mauri
TripleWave is an open-source framework for creating and publishing RDF streams over the Web. It converts various data sources like temporal RDF datasets and web streams into RDF streams. TripleWave makes these streams available via standard protocols and allows consuming applications to access the streams through pull via Linked Data principles or push using RSP services. The framework is implemented in NodeJS and available on GitHub to help spread the use of RDF streams on the semantic web.
The document discusses requirements and approaches for RDF stream processing (RSP). It covers the following key points in 3 sentences:
RSP aims to process continuous RDF streams to address scenarios like sensor data and social media. It involves querying streaming data, integrating streams with static data, and handling issues like imperfections. The document reviews existing RSP systems and languages, actor-based approaches, and the 8 requirements for real-time stream processing including keeping data moving, generating predictable outcomes, and responding instantaneously.
This document provides an overview of RDF stream processing and existing RDF stream processing engines. It discusses RDF streams and how sensor data can be represented as RDF streams. It also summarizes some existing RDF stream processing query languages and systems, including C-SPARQL, and the features they support like continuous execution, operators, and time-based windows. The document is intended as a tutorial for developers on working with RDF stream processing.
Brief report about the contents of the Stream Reasoning workshop at SIWC 2016. Additional info about the event are available at: http://streamreasoning.org/events/sr2016
This document summarizes Jean-Paul Calbimonte's presentation on connecting stream reasoners on the web. It discusses representing data streams as RDF and using RDF stream processing systems. Key points include:
- RDF streams can be represented as sequences of timestamped RDF graphs.
- The W3C RSP community group is working to standardize RDF stream models and query languages.
- Producing RDF streams involves mapping live data sources to RDF and adding timestamps.
- Consuming RDF streams involves discovering stream metadata and endpoints to access the streams.
- Systems like TripleWave demonstrate approaches for spreading RDF streams on the web.
Triplewave: a step towards RDF Stream Processing on the WebDaniele Dell'Aglio
The slides of my talk at INSIGHT Centre for Data Analytics (in NUI Galway) where I presented TripleWave (http://streamreasoning.github.io/TripleWave/), an open-source framework to create and publish streams of RDF data.
RSP4J is an API for RDF stream processing that addresses gaps in existing solutions. It provides an extensible architecture, declarative access to streams and queries, and programming abstractions to enable fast prototyping, benchmarking, and dissemination of RSP techniques. The API generalizes common operators, implements execution semantics of engines like YASPER and CSPARQL 2.0, and allows controlling query execution. RSP4J aims to foster practical RSP research by simplifying the implementation of new solutions, engines, and optimizations.
The document discusses the need for a W3C community group on RDF stream processing. It notes there is currently heterogeneity in RDF stream models, query languages, implementations, and operational semantics. The speaker proposes creating a W3C community group to better understand these differences, requirements, and potentially develop recommendations. The group's mission would be to define common models for producing, transmitting, and continuously querying RDF streams. The presentation provides examples of use cases and outlines a template for describing them to collect more cases to understand requirements.
TripleWave: Spreading RDF Streams on the WebAndrea Mauri
TripleWave is an open-source framework for creating and publishing RDF streams over the Web. It converts various data sources like temporal RDF datasets and web streams into RDF streams. TripleWave makes these streams available via standard protocols and allows consuming applications to access the streams through pull via Linked Data principles or push using RSP services. The framework is implemented in NodeJS and available on GitHub to help spread the use of RDF streams on the semantic web.
The document discusses requirements and approaches for RDF stream processing (RSP). It covers the following key points in 3 sentences:
RSP aims to process continuous RDF streams to address scenarios like sensor data and social media. It involves querying streaming data, integrating streams with static data, and handling issues like imperfections. The document reviews existing RSP systems and languages, actor-based approaches, and the 8 requirements for real-time stream processing including keeping data moving, generating predictable outcomes, and responding instantaneously.
This document provides an overview of RDF stream processing and existing RDF stream processing engines. It discusses RDF streams and how sensor data can be represented as RDF streams. It also summarizes some existing RDF stream processing query languages and systems, including C-SPARQL, and the features they support like continuous execution, operators, and time-based windows. The document is intended as a tutorial for developers on working with RDF stream processing.
Brief report about the contents of the Stream Reasoning workshop at SIWC 2016. Additional info about the event are available at: http://streamreasoning.org/events/sr2016
This document summarizes Jean-Paul Calbimonte's presentation on connecting stream reasoners on the web. It discusses representing data streams as RDF and using RDF stream processing systems. Key points include:
- RDF streams can be represented as sequences of timestamped RDF graphs.
- The W3C RSP community group is working to standardize RDF stream models and query languages.
- Producing RDF streams involves mapping live data sources to RDF and adding timestamps.
- Consuming RDF streams involves discovering stream metadata and endpoints to access the streams.
- Systems like TripleWave demonstrate approaches for spreading RDF streams on the web.
Triplewave: a step towards RDF Stream Processing on the WebDaniele Dell'Aglio
The slides of my talk at INSIGHT Centre for Data Analytics (in NUI Galway) where I presented TripleWave (http://streamreasoning.github.io/TripleWave/), an open-source framework to create and publish streams of RDF data.
RSP4J is an API for RDF stream processing that addresses gaps in existing solutions. It provides an extensible architecture, declarative access to streams and queries, and programming abstractions to enable fast prototyping, benchmarking, and dissemination of RSP techniques. The API generalizes common operators, implements execution semantics of engines like YASPER and CSPARQL 2.0, and allows controlling query execution. RSP4J aims to foster practical RSP research by simplifying the implementation of new solutions, engines, and optimizations.
The document discusses the need for a W3C community group on RDF stream processing. It notes there is currently heterogeneity in RDF stream models, query languages, implementations, and operational semantics. The speaker proposes creating a W3C community group to better understand these differences, requirements, and potentially develop recommendations. The group's mission would be to define common models for producing, transmitting, and continuously querying RDF streams. The presentation provides examples of use cases and outlines a template for describing them to collect more cases to understand requirements.
Heaven: A Framework for Systematic Comparative Research Approach for RSP EnginesRiccardo Tommasini
Benchmarks like LSBench, SRBench, CSRBench and, more recently, CityBench satisfy the growing need of shared datasets, ontologies and queries to evaluate window-based RDF Stream Processing (RSP) engines. However, no clear winner emerges out of the evaluation. In this paper, we claim that the RSP community needs to adopt a Systematic Comparative Research Approach (SCRA) if it wants to move a step forward. To this end, we propose a framework that enables SCRA for window based RSP engines. The contributions of this paper are: (i) the requirements to satisfy for tools that aim at enabling SCRA; (ii) the architecture of a facility to design and execute experiment guaranteeing repeatability, reproducibility and comparability; (iii) Heaven – a proof of concept implementation of such architecture that we released as open source –; (iv) two RSP engine implementations, also open source, that we propose as baselines for the comparative research (i.e., they can serve as terms of comparison in future works). We prove Heaven effectiveness using the baselines by: (i) showing that top-down hypothesis verification is not straight forward even in controlled conditions and (ii) providing examples of bottom-up comparative analysis.
A Hierarchical approach towards Efficient and Expressive Stream ReasoningRiccardo Tommasini
Abstract. Many approaches have been proposed for Stream Reasoning (SR). Some of them combine information flow processing (IFP) tech- niques and semantic technologies to make sense in real-time of noisy, vast and heterogeneous data streams that come from complex domains. More recent works shown the presence of a trade-off between through- put and reasoning expressiveness. Indeed, systems with IFP-like perfor- mance are not really expressive (e.g. up to an RDFS subset) and vice versa. For static data, Information Integration (II) systems approached the problem already. The idea consists in spreading the reasoning com- plexity over different layers of an hierarchical architecture and treating it where it is easier to do. Is it possible realize an expressive and efficient stream reasoning (E2SR), by defining a hierarchical approach that adapts II techniques to the streaming scenario? In this paper, I discuss my plan towards E2SR, the intuition of adapting Information Integration tech- niques to the streaming scenario and the need of Stream Reasoning of comparative analysis to support its technological progress.
RSP-QL*: Querying Data-Level Annotations in RDF Streamskeski
This document proposes an extension to RSP-QL called RSP-QL* that allows querying of statement-level annotations in RDF streams. RSP-QL* uses the RDF* model, which allows embedding RDF triples as the subject or object of other triples. This provides an efficient way to represent statement-level metadata in RDF. The semantics of RSP-QL are extended to support RSP-QL* patterns, which can include basic graph patterns, named graphs, windows and other operators. Future work includes adding more functionality to the RDF* model, prototyping an implementation, and evaluating performance.
This document discusses demos and tools for linking knowledge discovery (KDD) and linked data. It summarizes several tools that integrate linked data and KDD processes like data preprocessing, mining, and postprocessing. OpenRefine, RapidMiner, R, Matlab, ProLOD++, DL-Learner, Spark, KNIME, and Gephi were highlighted as tools that support tasks like enriching data, running SPARQL queries, loading RDF data, and visualizing linked data. The document concludes by asking about gaps and how to increase adoption, noting linked data could benefit KDD with validation, enrichment, and reasoning over semantic web data.
Information-Rich Programming in F# with Semantic DataSteffen Staab
Programming with rich data frequently implies that one
needs to search for, understand, integrate and program with
new data - with each of these steps constituting a major
obstacle to successful data use.
In this talk we will explain and demonstrate how our approach,
LITEQ - Language Integrated Types, Extensions and Queries for
RDF Graphs, which is realized as part of the F# / Visual Studio-
environment, supports the software developer. Using the extended
IDE the developer may now
a. explore new, previously unseen data sources,
which are either natively in RDF or mapped into RDF;
b. use the exploration of schemata and data in order to
construct types and objects in the F# environment;
c. automatically map between data and programming language objects in
order to make them persistent in the data source;
d. have extended typing functionality added to the F#
environment and resulting from the exploration of the data source
and its mapping into F#.
Core to this approach is the novel node path query language, NPQL,
that allows for interactive, intuitive exploration of data schemata and
data proper as well as for the mapping and definition
of types, object collections and individual objects.
Beyond the existing type provider mechanism for F#
our approach also allows for property-based navigation
and runtime querying for data objects.
Save queries as annotations. A method for the digital preservation of queries on a Hebrew Text database with linguistic information in it. These queries form the data for interpretations by biblical scholars. Sharing those queries as Open Annotation enables researchers to communicate their (intermediate) results.
This document introduces R and its integration with SparkR and Spark's MLlib machine learning library. It provides an overview of R and some of its most common data types like vectors, matrices, lists, and data frames. It then discusses how SparkR allows R to leverage Apache Spark's capabilities for large-scale data processing. SparkR exposes Spark's RDD API as distributed lists in R. The document also gives examples of using SparkR for tasks like word counting. It provides an introduction to machine learning concepts like supervised and unsupervised learning, and gives Naive Bayes classification as an example algorithm. Finally, it discusses how MLlib can currently be accessed from R through rJava until full integration with SparkR is completed.
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXrhatr
Graph relationships are everywhere. In fact, more often than not, analyzing relationships between points in your datasets lets you extract more business value from your data.
Consider social graphs, or relationships of customers to each other and products they purchase, as two of the most common examples. Now, if you think you have a scalability issue just analyzing points in your datasets, imagine what would happen if you wanted to start analyzing the arbitrary relationships between those data points: the amount of potential processing will increase dramatically, and the kind of algorithms you would typically want to run would change as well.
If your Hadoop batch-oriented approach with MapReduce works reasonably well, for scalable graph processing you have to embrace an in-memory, explorative, and iterative approach. One of the best ways to tame this complexity is known as the Bulk synchronous parallel approach. Its two most widely used implementations are available as Hadoop ecosystem projects: Apache Giraph (used at Facebook), and Apache GraphX (as part of a Spark project).
In this talk we will focus on practical advice on how to get up and running with Apache Giraph and GraphX; start analyzing simple datasets with built-in algorithms; and finally how to implement your own graph processing applications using the APIs provided by the projects. We will finally compare and contrast the two, and try to lay out some principles of when to use one vs. the other.
Introduction to Spark R with R studio - Mr. Pragith Sigmoid
R is a programming language and software environment for statistical computing and graphics.
The R language is widely used among statisticians and data miners for developing statistical
software and data analysis.
RStudio IDE is a powerful and productive user interface for R.
It’s free and open source, and available on Windows, Mac, and Linux.
The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. The LarKC platform is available at larkc.sourceforge.net. This talk, is part of a tutorial for early users of the LarKC platform, and describes the data model used within LarKC.
First impressions of SparkR: our own machine learning algorithmInfoFarm
In june 2015, SparkR was first integrated into SparkR. At InfoFarm we strive to stay on top of new technologies, hence we have tried it out and implemented a few machine learning algorithms as well.
LDQL: A Query Language for the Web of Linked DataOlaf Hartig
I used this slideset to present our research paper at the 14th Int. Semantic Web Conference (ISWC 2015). Find a preprint of the paper here:
http://olafhartig.de/files/HartigPerez_ISWC2015_Preprint.pdf
This presentation will describe how to go beyond a "Hello world" stream application and build a real-time data-driven product. We will present architectural patterns, go through tradeoffs and considerations when deciding on technology and implementation strategy, and describe how to put the pieces together. We will also cover necessary practical pieces for building real products: testing streaming applications, and how to evolve products over time.
Presented at highloadstrategy.com 2016 by Øyvind Løkling (Schibsted Products & Technology), joint work with Lars Albertsson (independent, www.mapflat.com).
Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf
* Presented by Alejandro Llaves (http://www.slideshare.net/allaves)
This document discusses big data and data-intensive science. It introduces the Lambda architecture, which processes streaming data in both batch and speed layers to generate real-time and batch views. The batch layer precomputes queries from all available data. The serving layer indexes batch views. The speed layer uses incremental algorithms to generate real-time views from new data. Queries are resolved by merging results from the batch and real-time views. Recommendations are made to leverage complex event processing and stream processing techniques to more efficiently construct views and handle merging and querying across layers.
Overview of the SPARQL-Generate language and latest developmentsMaxime Lefrançois
SPARQL-Generate is an extension of SPARQL 1.1 for querying not only RDF datasets but also documents in arbitrary formats. The solution bindings can then be used to output RDF (SPARQL-Generate) or text (SPARQL-Template)
Anyone familiar with SPARQL can easily learn SPARQL-Generate; Learning SPARQL-Generate helps you learning SPARQL.
The open-source implementation (Apache 2 license) is based on Apache Jena and can be used to execute transformations from a combination of RDF and any kind of documents in XML, JSON, CSV, HTML, GeoJSON, CBOR, streams of messages using WebSocket or MQTT... (easily extensible)
Recent extensions and improvement include:
- heavy refactoring to support parallelization
- more expressive iterators and functions
- simple generation of RDF lists
- support of aggregates
- generation of HDT (thanks Ana for the use case)
- partial implementation of STTL for the generation of Text (https://ns.inria.fr/sparql-template/)
- partial implementation of LDScript (http://ns.inria.fr/sparql-extension/)
- integration of all these types of rules to decouple or compose queries, e.g.:
- call a SPARQL-Generate query in the SPARQL FROM clause
- plug a SPARQL-Generate or a SPARQL-Template query to the output of a SPARQL-
Select function
- a Sublime Text package for local development
Incorporating Functions in Mappings to Facilitate the Uplift of CSV Files int...Ademar Crotti Junior
This document proposes incorporating functions into mappings to facilitate the conversion of CSV files to RDF. It defines functions as resources with names and bodies that can be used to capture domain knowledge and manipulate data during the conversion. The implementation extends existing R2RML and RML specifications by allowing functions to be called from mapping rules. Examples demonstrate mapping CSV data to RDF using functions to transform values into valid URIs.
The millions of people that use Spotify each day generate a lot of data, roughly a few terabytes per day. What does it take to handle datasets of that scale, and what can be done with it? I will briefly cover how Spotify uses data to provide a better music listening experience, and to strengthen their busineess. Most of the talk will be spent on our data processing architecture, and how we leverage state of the art data processing and storage tools, such as Hadoop, Cassandra, Kafka, Storm, Hive, and Crunch. Last, I'll present observations and thoughts on innovation in the data processing aka Big Data field.
Presentation on RDF Stream Processing models given at the SR4LD tutorial (ISWC 2013) -- updated version at: http://www.slideshare.net/dellaglio/rsp2014-01rspmodelsss
This document discusses RDF stream processing and the role of semantics. It begins by outlining common sources of streaming data on the internet of things. It then discusses challenges of querying streaming data and existing approaches like CQL. Existing RDF stream processing systems are classified based on their query capabilities and use of time windows and reasoning. The role of linked data principles and HTTP URIs for representing streaming sensor data is discussed. Finally, requirements for reactive stream processing systems are outlined, including keeping data moving, integrating stored and streaming data, and responding instantaneously. The document argues that building relevant RDF stream processing systems requires going beyond existing requirements to address data heterogeneity, stream reasoning, and optimization.
Heaven: A Framework for Systematic Comparative Research Approach for RSP EnginesRiccardo Tommasini
Benchmarks like LSBench, SRBench, CSRBench and, more recently, CityBench satisfy the growing need of shared datasets, ontologies and queries to evaluate window-based RDF Stream Processing (RSP) engines. However, no clear winner emerges out of the evaluation. In this paper, we claim that the RSP community needs to adopt a Systematic Comparative Research Approach (SCRA) if it wants to move a step forward. To this end, we propose a framework that enables SCRA for window based RSP engines. The contributions of this paper are: (i) the requirements to satisfy for tools that aim at enabling SCRA; (ii) the architecture of a facility to design and execute experiment guaranteeing repeatability, reproducibility and comparability; (iii) Heaven – a proof of concept implementation of such architecture that we released as open source –; (iv) two RSP engine implementations, also open source, that we propose as baselines for the comparative research (i.e., they can serve as terms of comparison in future works). We prove Heaven effectiveness using the baselines by: (i) showing that top-down hypothesis verification is not straight forward even in controlled conditions and (ii) providing examples of bottom-up comparative analysis.
A Hierarchical approach towards Efficient and Expressive Stream ReasoningRiccardo Tommasini
Abstract. Many approaches have been proposed for Stream Reasoning (SR). Some of them combine information flow processing (IFP) tech- niques and semantic technologies to make sense in real-time of noisy, vast and heterogeneous data streams that come from complex domains. More recent works shown the presence of a trade-off between through- put and reasoning expressiveness. Indeed, systems with IFP-like perfor- mance are not really expressive (e.g. up to an RDFS subset) and vice versa. For static data, Information Integration (II) systems approached the problem already. The idea consists in spreading the reasoning com- plexity over different layers of an hierarchical architecture and treating it where it is easier to do. Is it possible realize an expressive and efficient stream reasoning (E2SR), by defining a hierarchical approach that adapts II techniques to the streaming scenario? In this paper, I discuss my plan towards E2SR, the intuition of adapting Information Integration tech- niques to the streaming scenario and the need of Stream Reasoning of comparative analysis to support its technological progress.
RSP-QL*: Querying Data-Level Annotations in RDF Streamskeski
This document proposes an extension to RSP-QL called RSP-QL* that allows querying of statement-level annotations in RDF streams. RSP-QL* uses the RDF* model, which allows embedding RDF triples as the subject or object of other triples. This provides an efficient way to represent statement-level metadata in RDF. The semantics of RSP-QL are extended to support RSP-QL* patterns, which can include basic graph patterns, named graphs, windows and other operators. Future work includes adding more functionality to the RDF* model, prototyping an implementation, and evaluating performance.
This document discusses demos and tools for linking knowledge discovery (KDD) and linked data. It summarizes several tools that integrate linked data and KDD processes like data preprocessing, mining, and postprocessing. OpenRefine, RapidMiner, R, Matlab, ProLOD++, DL-Learner, Spark, KNIME, and Gephi were highlighted as tools that support tasks like enriching data, running SPARQL queries, loading RDF data, and visualizing linked data. The document concludes by asking about gaps and how to increase adoption, noting linked data could benefit KDD with validation, enrichment, and reasoning over semantic web data.
Information-Rich Programming in F# with Semantic DataSteffen Staab
Programming with rich data frequently implies that one
needs to search for, understand, integrate and program with
new data - with each of these steps constituting a major
obstacle to successful data use.
In this talk we will explain and demonstrate how our approach,
LITEQ - Language Integrated Types, Extensions and Queries for
RDF Graphs, which is realized as part of the F# / Visual Studio-
environment, supports the software developer. Using the extended
IDE the developer may now
a. explore new, previously unseen data sources,
which are either natively in RDF or mapped into RDF;
b. use the exploration of schemata and data in order to
construct types and objects in the F# environment;
c. automatically map between data and programming language objects in
order to make them persistent in the data source;
d. have extended typing functionality added to the F#
environment and resulting from the exploration of the data source
and its mapping into F#.
Core to this approach is the novel node path query language, NPQL,
that allows for interactive, intuitive exploration of data schemata and
data proper as well as for the mapping and definition
of types, object collections and individual objects.
Beyond the existing type provider mechanism for F#
our approach also allows for property-based navigation
and runtime querying for data objects.
Save queries as annotations. A method for the digital preservation of queries on a Hebrew Text database with linguistic information in it. These queries form the data for interpretations by biblical scholars. Sharing those queries as Open Annotation enables researchers to communicate their (intermediate) results.
This document introduces R and its integration with SparkR and Spark's MLlib machine learning library. It provides an overview of R and some of its most common data types like vectors, matrices, lists, and data frames. It then discusses how SparkR allows R to leverage Apache Spark's capabilities for large-scale data processing. SparkR exposes Spark's RDD API as distributed lists in R. The document also gives examples of using SparkR for tasks like word counting. It provides an introduction to machine learning concepts like supervised and unsupervised learning, and gives Naive Bayes classification as an example algorithm. Finally, it discusses how MLlib can currently be accessed from R through rJava until full integration with SparkR is completed.
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXrhatr
Graph relationships are everywhere. In fact, more often than not, analyzing relationships between points in your datasets lets you extract more business value from your data.
Consider social graphs, or relationships of customers to each other and products they purchase, as two of the most common examples. Now, if you think you have a scalability issue just analyzing points in your datasets, imagine what would happen if you wanted to start analyzing the arbitrary relationships between those data points: the amount of potential processing will increase dramatically, and the kind of algorithms you would typically want to run would change as well.
If your Hadoop batch-oriented approach with MapReduce works reasonably well, for scalable graph processing you have to embrace an in-memory, explorative, and iterative approach. One of the best ways to tame this complexity is known as the Bulk synchronous parallel approach. Its two most widely used implementations are available as Hadoop ecosystem projects: Apache Giraph (used at Facebook), and Apache GraphX (as part of a Spark project).
In this talk we will focus on practical advice on how to get up and running with Apache Giraph and GraphX; start analyzing simple datasets with built-in algorithms; and finally how to implement your own graph processing applications using the APIs provided by the projects. We will finally compare and contrast the two, and try to lay out some principles of when to use one vs. the other.
Introduction to Spark R with R studio - Mr. Pragith Sigmoid
R is a programming language and software environment for statistical computing and graphics.
The R language is widely used among statisticians and data miners for developing statistical
software and data analysis.
RStudio IDE is a powerful and productive user interface for R.
It’s free and open source, and available on Windows, Mac, and Linux.
The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. The LarKC platform is available at larkc.sourceforge.net. This talk, is part of a tutorial for early users of the LarKC platform, and describes the data model used within LarKC.
First impressions of SparkR: our own machine learning algorithmInfoFarm
In june 2015, SparkR was first integrated into SparkR. At InfoFarm we strive to stay on top of new technologies, hence we have tried it out and implemented a few machine learning algorithms as well.
LDQL: A Query Language for the Web of Linked DataOlaf Hartig
I used this slideset to present our research paper at the 14th Int. Semantic Web Conference (ISWC 2015). Find a preprint of the paper here:
http://olafhartig.de/files/HartigPerez_ISWC2015_Preprint.pdf
This presentation will describe how to go beyond a "Hello world" stream application and build a real-time data-driven product. We will present architectural patterns, go through tradeoffs and considerations when deciding on technology and implementation strategy, and describe how to put the pieces together. We will also cover necessary practical pieces for building real products: testing streaming applications, and how to evolve products over time.
Presented at highloadstrategy.com 2016 by Øyvind Løkling (Schibsted Products & Technology), joint work with Lars Albertsson (independent, www.mapflat.com).
Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf
* Presented by Alejandro Llaves (http://www.slideshare.net/allaves)
This document discusses big data and data-intensive science. It introduces the Lambda architecture, which processes streaming data in both batch and speed layers to generate real-time and batch views. The batch layer precomputes queries from all available data. The serving layer indexes batch views. The speed layer uses incremental algorithms to generate real-time views from new data. Queries are resolved by merging results from the batch and real-time views. Recommendations are made to leverage complex event processing and stream processing techniques to more efficiently construct views and handle merging and querying across layers.
Overview of the SPARQL-Generate language and latest developmentsMaxime Lefrançois
SPARQL-Generate is an extension of SPARQL 1.1 for querying not only RDF datasets but also documents in arbitrary formats. The solution bindings can then be used to output RDF (SPARQL-Generate) or text (SPARQL-Template)
Anyone familiar with SPARQL can easily learn SPARQL-Generate; Learning SPARQL-Generate helps you learning SPARQL.
The open-source implementation (Apache 2 license) is based on Apache Jena and can be used to execute transformations from a combination of RDF and any kind of documents in XML, JSON, CSV, HTML, GeoJSON, CBOR, streams of messages using WebSocket or MQTT... (easily extensible)
Recent extensions and improvement include:
- heavy refactoring to support parallelization
- more expressive iterators and functions
- simple generation of RDF lists
- support of aggregates
- generation of HDT (thanks Ana for the use case)
- partial implementation of STTL for the generation of Text (https://ns.inria.fr/sparql-template/)
- partial implementation of LDScript (http://ns.inria.fr/sparql-extension/)
- integration of all these types of rules to decouple or compose queries, e.g.:
- call a SPARQL-Generate query in the SPARQL FROM clause
- plug a SPARQL-Generate or a SPARQL-Template query to the output of a SPARQL-
Select function
- a Sublime Text package for local development
Incorporating Functions in Mappings to Facilitate the Uplift of CSV Files int...Ademar Crotti Junior
This document proposes incorporating functions into mappings to facilitate the conversion of CSV files to RDF. It defines functions as resources with names and bodies that can be used to capture domain knowledge and manipulate data during the conversion. The implementation extends existing R2RML and RML specifications by allowing functions to be called from mapping rules. Examples demonstrate mapping CSV data to RDF using functions to transform values into valid URIs.
The millions of people that use Spotify each day generate a lot of data, roughly a few terabytes per day. What does it take to handle datasets of that scale, and what can be done with it? I will briefly cover how Spotify uses data to provide a better music listening experience, and to strengthen their busineess. Most of the talk will be spent on our data processing architecture, and how we leverage state of the art data processing and storage tools, such as Hadoop, Cassandra, Kafka, Storm, Hive, and Crunch. Last, I'll present observations and thoughts on innovation in the data processing aka Big Data field.
Presentation on RDF Stream Processing models given at the SR4LD tutorial (ISWC 2013) -- updated version at: http://www.slideshare.net/dellaglio/rsp2014-01rspmodelsss
This document discusses RDF stream processing and the role of semantics. It begins by outlining common sources of streaming data on the internet of things. It then discusses challenges of querying streaming data and existing approaches like CQL. Existing RDF stream processing systems are classified based on their query capabilities and use of time windows and reasoning. The role of linked data principles and HTTP URIs for representing streaming sensor data is discussed. Finally, requirements for reactive stream processing systems are outlined, including keeping data moving, integrating stored and streaming data, and responding instantaneously. The document argues that building relevant RDF stream processing systems requires going beyond existing requirements to address data heterogeneity, stream reasoning, and optimization.
Unified Big Data Processing with Apache SparkC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1yNuLGF.
Matei Zaharia talks about the latest developments in Spark and shows examples of how it can combine processing algorithms to build rich data pipelines in just a few lines of code. Filmed at qconsf.com.
Matei Zaharia is an assistant professor of computer science at MIT, and CTO of Databricks, the company commercializing Apache Spark.
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
This document discusses using Apache Spark and Apache HBase to build a time series application. It provides an overview of time series data and requirements for ingesting, storing, and analyzing high volumes of time series data. The document then describes using Spark Streaming to process real-time data streams from sensors and storing the data in HBase. It outlines the steps in the lab exercise, which involves reading sensor data from files, converting it to objects, creating a Spark Streaming DStream, processing the DStream, and saving the data to HBase.
This document proposes an approach to enable ontology-based access to streaming data sources. It discusses mapping streaming data schemas to ontological concepts and extending SPARQL to support querying streaming RDF data. This would allow expressing continuous queries over streaming data using ontological terms. The approach includes translating such SPARQL queries to queries over streaming data sources using mappings between the ontology and streaming schemas. An implementation of a semantic integration service is proposed to deploy this ontology-based access to streaming data.
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...eswcsummerschool
Ontotext is a leading semantic technology company that has developed OWLIM, a family of semantic repositories for storing and querying RDF and OWL data. OWLIM can handle large datasets, perform reasoning, and supports features like full text search, notifications, and geo-spatial querying. It has been used successfully in large-scale production systems like the BBC's World Cup website to power semantic search and dynamic content delivery using semantic web technologies.
Spark Summit EU talk by Sameer AgarwalSpark Summit
This document discusses Project Tungsten, which aims to substantially improve the memory and CPU efficiency of Spark. It describes how Spark has optimized IO but the CPU has become the bottleneck. Project Tungsten focuses on improving execution performance through techniques like explicit memory management, code generation, cache-aware algorithms, whole-stage code generation, and columnar in-memory data formats. It shows how these techniques provide significant performance improvements, such as 5-30x speedups on operators and 10-100x speedups on radix sort. Future work includes cost-based optimization and improving performance on many-core machines.
The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. The LarKC platform is available at larkc.sourceforge.net. This talk, is part of a tutorial for early users of the LarKC platform, and introduces the platform and the project in general.
The document compares the performance of different data serialization formats (JSON, Apache Avro, Protocol Buffers) for real-time applications. It describes building a pipeline to ingest, process, and cache serialized data. Benchmark results show JSON has the highest throughput but also the highest latency, while Protocol Buffers has the lowest throughput but lowest latency. The document recommends JSON for latency-critical, small data and Protocol Buffers for data-heavy, real-time applications relying on Google services. It also provides information about monitoring throughput patterns and the presenter's background and skills.
Towards efficient processing of RDF data streamsAlejandro Llaves
Presentation of short paper submitted to OrdRing workshop, held at ISWC 2014 - http://streamreasoning.org/events/ordring2014.
In the last years, there has been an increase in the amount of real-time data generated. Sensors attached to things are transforming how we interact with our environment. Extracting meaningful information from these streams of data is essential for some application areas and requires processing systems that scale to varying conditions in data sources, complex queries, and system failures. This paper describes ongoing research on the development of a scalable RDF streaming engine.
Towards efficient processing of RDF data streamsAlejandro Llaves
This document discusses efficient processing of RDF data streams. It proposes using the Storm distributed stream processing system and Lambda Architecture to address challenges of scalability, latency, and integrating historical and real-time data. Key components include Storm-based operators to parallelize SPARQL queries over streams, adaptive query processing to adjust to changing conditions, and an ERI compression format to reduce transmission costs for structured RDF streams. Open questions remain around parallelization and handling of out-of-order tuples.
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
Spark Structured Streaming vs. Kafka Streams was compared. Spark Structured Streaming runs on a Spark cluster and allows reuse of Spark investments, while Kafka Streams is a Java library that provides low latency continuous processing. Both platforms support stateful operations like windows, aggregations and joins. Spark Structured Streaming supports multiple languages but has higher latency due to micro-batching, while Kafka Streams currently only supports Java but provides lower latency continuous processing.
This document provides an overview of stream reasoning and discusses Riccardo Tommasini's master's thesis. It outlines Tommasini's background, research interests, and research group in stream reasoning. It also presents an example of how stream reasoning can be used to query streaming sensor data to determine who is in what room. Tommasini's goal is to enable systematic comparative evaluation of RSP engines using standardized queries, datasets, and metrics within a test stand framework.
This document discusses Timbuctoo, an application designed for academic research that allows for complex and heterogeneous data. It explores archiving RDF datasets from Timbuctoo instances, including handling RDF graphs and triples, versioning datasets, and verifying dataset integrity and resolving links. A potential pipeline is proposed to ingest datasets from Timbuctoo into the EASY archive, but current Timbuctoo instances and datasets have obscure URIs and insufficient metadata, and the prototype pipeline lacks specifications. Archiving linked data from Timbuctoo could change the nature of preservation for archives.
Introduction to Data streaming - 05/12/2014Raja Chiky
Raja Chiky is an associate professor whose research interests include data stream mining, distributed architectures, and recommender systems. The document outlines data streaming concepts including what a data stream is, data stream management systems, and basic approximate algorithms used for processing massive, high-velocity data streams. It also discusses challenges in distributed systems and using semantic technologies for data streaming.
1) The document discusses the use of RaptorQ coding in data center networks to address various traffic patterns like incast, one-to-many, many-to-one flows.
2) RaptorQ codes allow symbols to be sprayed across multiple paths and receivers can reconstruct the data from any subset of symbols. This enables efficient handling of multi-path, multi-source and multicast traffic.
3) Evaluation results show that RaptorQ coding improves throughput compared to TCP, especially in scenarios with incast traffic or multiple senders transmitting to a receiver. The rateless property and resilience to packet loss makes it well-suited for data center network traffic.
Mining and Managing Large-scale Linked Open DataMOVING Project
Linked Open Data (LOD) is about publishing and interlinking data of different origin and purpose on the web. The Resource Description Framework (RDF) is used to describe data on the LOD cloud. In contrast to relational databases, RDF does not provide a fixed, pre-defined schema. Rather, RDF allows for flexibly modeling the data schema by attaching RDF types and properties to the entities. Our schema-level index called SchemEX allows for searching in large-scale RDF graph data. The index can be efficiently computed with reasonable accuracy over large-scale data sets with billions of RDF triples, the smallest information unit on the LOD cloud. SchemEX is highly needed as the size of the LOD cloud quickly increases. Due to the evolution of the LOD cloud, one observes frequent changes of the data. We show that also the data schema changes in terms of combinations of RDF types and properties. As changes cannot capture the dynamics of the LOD cloud, current work includes temporal clustering and finding periodicities in entity dynamics over large-scale snapshots of the LOD cloud with about 100 million triples per week for more than three years.
Mining and Managing Large-scale Linked Open DataAnsgar Scherp
Linked Open Data (LOD) is about publishing and interlinking data of different origin and purpose on the web. The Resource Description Framework (RDF) is used to describe data on the LOD cloud. In contrast to relational databases, RDF does not provide a fixed, pre-defined schema. Rather, RDF allows for flexibly modeling the data schema by attaching RDF types and properties to the entities. Our schema-level index called SchemEX allows for searching in large-scale RDF graph data. The index can be efficiently computed with reasonable accuracy over large-scale data sets with billions of RDF triples, the smallest information unit on the LOD cloud. SchemEX is highly needed as the size of the LOD cloud quickly increases. Due to the evolution of the LOD cloud, one observes frequent changes of the data. We show that also the data schema changes in terms of combinations of RDF types and properties. As changes cannot capture the dynamics of the LOD cloud, current work includes temporal clustering and finding periodicities in entity dynamics over large-scale snapshots of the LOD cloud with about 100 million triples per week for more than three years.
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...NETWAYS
How to store billions of time series points and access them within a few milliseconds? Chronix!
Chronix is a young but mature open source project that allows one for example to store about 15 GB (csv) of time series in 238 MB with average query times of 21 ms. Chronix is built on top of Apache Solr a bulletproof distributed NoSQL database with impressive search capabilities. In this code-intense session we show how Chronix achieves its efficiency in both respects by means of an ideal chunking, by selecting the best compression technique, by enhancing the stored data with (pre-computed) attributes, and by specialized query functions.
Similaire à OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing (20)
Organisational Interoperability in Practice at Universidad Politécnica de MadridOscar Corcho
Presentation on EOSC Interoperability Framework in relation to Organisational Interoperability, and how it can be applied to a Research Performing Organisation such as UPM
Open Data (and Software, and other Research Artefacts) -A proper managementOscar Corcho
Presentation at the event "Let's do it together: How to implement Open Science Practices in Research Projects" (29/11/2019), organised by Universidad Politécnica de Madrid, where we discuss on the need to take into account not only open access or open research data, but also all the other artefacts that are a result of our research processes.
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosOscar Corcho
Esta presentación se ha realizado en el contexto de la Jornada sobre difusión, accesibilidad y reutilización de la estadística y cartografía oficial (http://www.juntadeandalucia.es/institutodeestadisticaycartografia/blog/2019/11/jornada-plan/), organizada por el Instituto de Estadística y Cartografía de Andalucía.
Ontology Engineering at Scale for Open City Data SharingOscar Corcho
Seminar at the School of Informatics, The University of Edinburgh.
In this talk we will present how we are applying ontology engineering principles and tools for the development of a set of shared vocabularies across municipalities in Spain, so that they can start homogenising the generation and publication of open data that may be useful for their own internal reuse as well as for third parties who want to develop applications reusing open data once and deploy them for all municipalities. We will discuss on the main challenges for ontology engineering that arise in this setting, as well as present the work that we have done to integrate ontology development tools into common software development infrastructure used by those who are not experts in Ontology Engineering.
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Oscar Corcho
Presentación sobre iniciativas de Open Data Internacionales y nacionales, realizada en el contexto del Curso de Verano de la Universidad de Extremadura "BigData y Machine Learning junto a fuentes de datos abiertos para especializar el sector agroganadero", el 25/09/2018
Presentación general sobre contaminación lumínica, en español, del proyecto STARS4ALL (www.stars4all.eu). Generada por el consorcio del proyecto, con especial agradecimiento a Lucía García (@shekda) por generar la primera versión en inglés, y Miquel Serra-Ricart, por realizar su traducción inicial.
Towards Reproducible Science: a few building blocks from my personal experienceOscar Corcho
Invited keynote given at the Second International Workshop on Semantics for BioDiversity (http://fusion.cs.uni-jena.de/s4biodiv2017/), held in conjunction with ISWC2017 (https://iswc2017.semanticweb.org/)
Publishing Linked Statistical Data: Aragón, a case studyOscar Corcho
Presentation at the Semstats2017 workshop (http://semstats.org/2017/) for the paper "Publishing Linked Statistical Data: Aragón, a Case Study", by Oscar Corcho, Idafen Santana-Pérez, Hugo Lafuente, David Portolés, César Cano, Alfredo Peris, José María Subero.
An initial analysis of topic-based similarity among scientific documents base...Oscar Corcho
This document analyzes the representativeness of different parts of scientific documents, including abstracts and sections related to the approach, outcome, and background. It finds that summaries created from the approach, outcome, or background better represent the full document and related documents than abstracts, based on measures of internal and external representativeness. Future work will use probabilistic topic models better suited to short texts.
This document discusses linked statistical data and its benefits. It provides an overview of key concepts like open data, linked data, and the W3C RDF DataCube specification. It also presents a case study of a statistical office in Aragon, Spain that has published local government data as linked open data. Publishing data this way allows for easier reuse by both internal and external developers. It facilitates integration with other datasets and enables complex queries across multiple sources. Overall, representing statistical data using semantic web standards like RDF DataCube provides advantages for data sharing and reuse.
Aplicando los principios de Linked Data en AEMETOscar Corcho
Este documento describe un proyecto de 2011 para publicar los datos meteorológicos de la agencia estatal de meteorología de España (AEMET) como Linked Data. Se detalla cómo los datos de estaciones meteorológicas y observaciones se mapearon a ontologías y se publicaron en el web de datos. Aunque el sitio web ya no se mantiene, el documento discute posibles próximos pasos como integrar los principios de Linked Data en la API de datos abiertos de AEMET.
Ojo Al Data 100 - Call for sharing session at IODC 2016Oscar Corcho
This is the presentation of the #ojoaldata100 initiative (http://ojoaldata100.okfn.es) for the selection of 100 datasets that every city should be publishing in their open data portal. This presentation was used in a call for sharing session at the 4th International Open Data Conference (IODC2016).
Educando sobre datos abiertos: desde el colegio a la universidadOscar Corcho
Presentación realizada en la mesa 3 del evento Aporta 2016, uno de los pre-eventos de la semana de los datos abiertos en Madrid. Realizada el 3 de octubre del 2016.
http://datos.gob.es/encuentro-aporta?q=node/654503
STARS4ALL general presentation at ALAN2016Oscar Corcho
The STARS4ALL project aims to create a platform to support Light Pollution Initiatives (LPIs) through citizen-based sensor data acquisition, games, and funding. LPIs address issues like loss of night sky visibility, environmental and economic impacts of excess light, and threats to species. The project will select up to 10 new LPIs in year 2, offering technical support and a small travel budget to address issues like energy efficiency, astronomy, health, and biodiversity. Citizens, organizations, and local authorities are encouraged to propose their own LPI or join the External Citizen Activist Team.
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaOscar Corcho
En esta presentación mostramos el trabajo realizado para la generación y publicación de datos enlazados a partir de los datos de estadística local del Instituto Aragonés de Estadística
Presentación de la red de excelencia de Open Data y Smart CitiesOscar Corcho
Presentación general de la red de excelencia de Open Data y Smart Cities (http://www.opencitydata.es), realizada en Medialab-Prado el 18 de febrero de 2016
Why do they call it Linked Data when they want to say...?Oscar Corcho
The four Linked Data publishing principles established in 2006 seem to be quite clear and well understood by people inside and outside the core Linked Data and Semantic Web community. However, not only when discussing with outsiders about the goodness of Linked Data but also when reviewing papers for the COLD workshop series, I find myself, in many occasions, going back again to the principles in order to see whether some approach for Web data publication and consumption is actually Linked Data or not. In this talk we will review some of the current approaches that we have for publishing data on the Web, and we will reflect on why it is sometimes so difficult to get into an agreement on what we understand by Linked Data. Furthermore, we will take the opportunity to describe yet another approach that we have been working on recently at the Center for Open Middleware, a joint technology center between Banco Santander and Universidad Politécnica de Madrid, in order to facilitate Linked Data consumption.
Linked Statistical Data: does it actually pay off?Oscar Corcho
Invited keynote at the ISWC2015 Workshop on Semantics and Statistics (SemStats 2015). http://semstats.github.io/2015/
The release of the W3C RDF Data Cube recommendation was a significant milestone towards improving the maturity of the area of Linked Statistical Data. Many Data Cube-based datasets have been released since then. Tools for the generation and exploitation of such datasets have also appeared. While the benefits for the usage of RDF Data Cube and the generation of Linked Data in this area seem to be clear, there are still many challenges associated to the generation and exploitation of such data. In this talk we will reflect about them, based on our experience on generating and exploiting such type of data, and hopefully provoke some discussion about what the next steps should be.
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Oscar Corcho
The document discusses slow-cooked data and APIs from a city perspective. It draws an analogy between big data/fast food and slow-cooked/linked open data. It outlines six rules for slow-cooking data: 1) appropriately segment datasets, 2) annotate data with semantics, 3) provide multiple data formats, 4) engage children in data contribution and use, 5) use open data internally before publishing, and 6) leverage common data structures for interoperability like fast food franchises do. The goal is to cook open data in a way that is both useful and reusable.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Processing
1. On the need for a W3C
community group on RDF
Stream Processing
ISWC2013 Workshop on Ordering and Reasoning,
Sydney, 22/10/2013
Oscar Corcho
ocorcho@fi.upm.es, ocorcho@localidata.com
@ocorcho
http://www.slideshare.net/ocorcho/
2. Disclaimer…
This presentation expresses my view but not necessarily the one from
the rest of the group (although I hope that it is similar)
<<Texto libre: proyecto, speaker, etc.>>
2
3. Acknowledgements
• All those that I have “stolen” slides, material and
ideas from
•
•
•
•
•
Emanuele Della Valle
Daniele Dell’Aglio
Marco Balduini
Jean Paul Calbimonte
And many others who
have already started
contributing…
<<Texto libre: proyecto, speaker, etc.>>
3
4. Why setting up a community group?
In RDF Stream models
(timestamps, events, time
intervals, triple-based, graph-based …)
In RDF Stream query languages
(windows, stream selection,
CEP-based operators, …)
Heterogeneity
In implementations
(RDF native, query rewriting,
continuous query registration,
scalability, static vs streaming data…)
<<Texto libre: proyecto, speaker, etc.>>
4
In operational semantics
(tick, window content, report)
5. You may think that we do not like heterogeneity…
<<Texto libre: proyecto, speaker, etc.>>
5
6. But at least I love it…
• However, we need to tell people what to expect with
each system, and smooth differences when they
are not crucial……
<<Texto libre: proyecto, speaker, etc.>>
6
7. The solution…
• Let’s create a W3C community group…
•
•
•
•
•
To understand better those differences
The requirements on which we are based
And explain to others
…
And maybe get some “recommendation” out
<<Texto libre: proyecto, speaker, etc.>>
7
8. The W3C RDF Stream Processing Comm. Group
• http://www.w3.org/community/rsp/
<<Texto libre: proyecto, speaker, etc.>>
8
9. W3C RSP Community Group mission
“The mission of the RDF Stream Processing
Community Group (RSP) is to define a common model
for producing, transmitting and continuously querying
RDF Streams. This includes extensions to both RDF
and SPARQL for representing streaming data, as well
as their semantics. Moreover this work envisions an
ecosystem of streaming and static RDF data sources
whose data can be combined through standard models,
languages and protocols. Complementary to related
work in the area of databases, this Community Group
looks at the dynamic properties of graph-based data,
i.e., graphs that are produced over time and which may
change their shape and data over time.”
<<Texto libre: proyecto, speaker, etc.>>
9
10. Use cases
• We have started collecting them
• And I hope that by the end of my talk you will
consider contributing some more…
<<Texto libre: proyecto, speaker, etc.>>
10
11. A template to describe use cases (I)
•
Streaming Information
•
•
•
•
•
•
Type: Environmental data: temperatures, pressures, salinity, acidity, fluid
velocities etc,
Nature:
• Relational Stream: yes
• Text stream: no
Origin: Data is produced by sensors in oil wells and on oil and gas
platforms equipments. Each oil platform has an average of 400.000.
Frequency of update:
• from sub-second to minutes
• In triples/minute: [10000-10] t/min
Quality: It varies, due to instrument/sensor issues
Management /access
• Technology in use: Dedicated (relational and proprietary) stores
• Problems: The ability of users to access data from different sources is
limited by an insufficient description of the context
• Means of improvement: Add context (metadata) to the data so it
become meaningful and use reasoning techniques to process that
metadata
<<Texto libre: proyecto, speaker, etc.>>
11
12. A template to describe use cases (II)
•
[optional] Static Information required to interpret the streaming
information
•
•
•
•
•
Type: Topology of the sensor network, position of each sensor, the
descriptions of the oil platform
Origin: Oil and gas production operations
Dimension:
• 100s of MB as PostGIS dump
• In triples: 10^8
Quality: Good
Management / access
• Technology in use: RDBMS, proprietary technologies
• Available Ontologies and Vocabularies: Reference Semantic Model
(RSM), based on ISO 15926
<<Texto libre: proyecto, speaker, etc.>>
12
13. A tale of four heterogeneities
ISWC2013 Workshop on Ordering and Reasoning,
Sydney, 22/10/2013
Oscar Corcho
ocorcho@fi.upm.es, ocorcho@localidata.com
@ocorcho
http://www.slideshare.net/ocorcho/
15. What is an RDF stream?
• Several possibilities:
• An RDF stream is an infinite sequence of timestamped
events (triples or graphs), where timestamps are nondecreasing
…
<eventi,ti >
<eventi+1,ti+1 >
<eventi+2,ti+2 >
…
• An RDF stream is an infinite sequence of triple occurrences
<<s,p,o>,tα,tω> where <s,p,o> is an RDF triple and tα and tω
are the start and end of the interval
• How are timestamps assigned?
16. Some examples…
• What would be the best/possible RDF stream
representation for the following types of problems?
• Does Alice meet Bob before Carl?
• Who does Carl meet first?
:alice :isWith :bob
:alice :isWith :carl
e1
:diana :isWith :carl
:bob :isWith :diana
e2
e3
e4
• How many people has Alice met in the last 5m?
• Does Diana meet Bob and then Carl within 5m?
1
3
6
9
t
• Which are the meetings the last less than 5m?
• Which are the meetings with conflicts?
:alice :isWith :bob
:alice :isWith :carl
:bob :isWith :diana
:diana :isWith :carl
e4
e2
e1
<<Texto libre: proyecto, speaker, etc.>>
e3
16
17. Data types for semantic streams - Summary
•
Multiple notions of RDF stream proposed
• Ordered sequence (implicit timestamp)
• One timestamp per triple (point in time semantics)
• Two timestamps per triple (interval base semantics)
•
Comparison between existing approaches
System
Time model
# of timestamps
INSTANS
triple
Implicit
0
C-SPARQL
triple
Point in time
1
SPARQLstream
triple
Point in time
1
CQELS
triple
Point in time
1
Sparkwave
triple
Point in time
1
Streaming Linked Data
RDF graph
Point in time
1
ETALIS
•
Data item
triple
Interval
2
More investigation is required to agree on an RDF stream model
17
19. Existing RDF Stream Processing systems
• C-SPARQL: RDF Store + Stream processor
• Combined architecture
C-SPARQL
query
sta
translator
tic
stre
amin
RDF Store
g
Stream
processor
continuous
results
• CQELS: Implemented from scratch. Focus on performance
• Native + adaptive joins for static-data and streaming data
CQELS
query
Native RSP
continuous
results
• CQELS-Cloud: Reusing Storm
• Paper presentation on Thursday
CQELS
query
Storm
topology
continuous
results
20. Existing RSP systems
• EP-SPARQL: Complex-event detection
• SEQ, EQUALS operators
EP-SPARQL
query
translator
Prolog
engine
continuous
results
• SPARQLStream: Ontology-based stream query
answering
• Virtual RDF views, using R2RML mappings
• SPARQL stream queries over the original data streams.
SPARQLStream
query
rewriter
DSMS/CEP
R2RML mappings
• Instans: RETE-based evaluation
continuous
results
21. Query languages for semantic streams - Summary
• Different architectural choices
• It is not clear when each choice is best for which type of use
case
• Wrappers over existing systems
• C-SPARQL, ETALIS, SPARQLstream , CQELS-Cloud
• Better reliability and maintainability?
• Native implementations
• CQELS, Streaming Linked Data, INSTANS
• Better scalability: optimizations that are not possible
in other systems
• Different operational semantics
• See later
21
23. Querying data streams (from CQL to SPARQL-X)
stream-to-relation (S2R)
Relation
s
Streams
infinite
unbounded
bag
…
<s,τ>
…
relation-to-relation (R2R)
relation-to-stream (R2S)
Stream
<s1>
<s2>
<s3>
finite
bag
Relati on R(t)
Mapping: T R
S2R Window operators
RDF
Streams
SPARQL operators
RDF
R2S operators
24. Output: relation
• Case 1: the output is a set of timestamped mappings
a … ?b… [t1]
a … ?b…
SELECT ?a ?b …
FROM ….
WHERE ….
queries
CONSTRUCT {?a :prop ?b }
FROM ….
WHERE ….
a … ?b… [t3]
a … ?b… [t5]
RS
P
a … ?b… [t7]
bindings
<… :prop … > [t1]
<… :prop … >
<… :prop … > [t3]
<… :prop … > [t5]
<… :prop … > [t7]
triples
25. Output: stream
• Case 2: the output is a stream
• R2S operators
CONSTRUCT RSTREAM {?a :prop ?b }
FROM ….
WHERE ….
query
RS
P
stream
…
<… :prop … > [t1]
<… :prop … > [t1]
<… :prop … > [t3]
<… :prop … > [t5]
< …:prop … > [t7]
…
ISTREAM: stream out data in the last step that wasn’t on the previous step
DSTREAM: stream out data in the previous step that isn’t in the last step
RSTREAM: stream out all data in the last step
26. Other operators
• Sequence operators and CEP world
e4
S
e1
e2
e3
1
3
6
Sequence
9
Simultaneous
SEQ: joins eti,tf and e’ti’,tf’ if e’ occurs after e
EQUALS: joins eti,tf and e’ti’,tf’ if they occur simultaneously
OPTIONALSEQ, OPTIONALEQUALS: Optional join variants
27. Query languages for semantic streams - Summary
•
Comparison between existing approaches
System
S2R
R2R
Time-aware
R2S
INSTANS
Based on
time events
SPARQL
update
Based on time events
Ins only
C-SPARQL
Engine
Logical and
triple-based
SPARQL 1.1
query
timestamp function
Batch only
SPARQLstream
Logical and
triple-based
SPARQL 1.1
query
no
Ins, batch,
del
CQELS
Logical and
triple-based
SPARQL 1.1
query
no
Ins only
Sparkwave
Logical
SPARQL 1.0
no
Ins only
Streaming Linked
Data
Logical and
graph-based
SPARQL 1.1
no
Batch only
ETALIS
no
SPARQL 1.0
• Is it time to converge on a
27
SEQ, PAR, AND, OR,
DURING, STARTS,
standard? NOT,
EQUALS,
MEETS, FINISHES
Ins only
28. Query languages for semantic streams - Issues
• Different syntax for S2R operator
• Semantics of query languages is similar, but not
identical
• Lack of R2S operator in some cases
• Different support for time-aware operators
28
31. Operational Semantics
Where are both alice and bob in the last 5s?
hall
:hall
sIn :
:i
isIn
e
:
:alic
:bob
S
e
:alic
hen
:kitc
:isIn
S1
S2
S3
S4
1
3
6
:bob
hen
:kitc
:isIn
9
System 1:
System 2:
:hall [5]
:hall [3]
t
:kitchen [10]
:kitchen [9]
Both correct?
ISWC 2013 evaluation track for "On Correctness in RDF stream
processor benchmarking" by Daniele Dell’Aglio, Jean-Paul
Calbimonte, Marco Balduini, Oscar Corcho and Emanuele Della Valle
33. Next steps in the community group…
• Agree on an RDF model?
•
•
•
•
Metamodel?
Timestamps in graphs?
Timestamp intervals
Compatibility with normal (static) RDF
• Additional operators for SPARQL?
• Windows (not only time based?)
• CEP operators
• Semantics
• Go Web
• Volatile URIs
• Serialization: terse, compact
• Protocols: HTTP, Websockets?
34. On the need for a W3C
community group on RDF
Stream Processing
ISWC2013 Workshop on Ordering and Reasoning,
Sydney, 22/10/2013
Oscar Corcho
ocorcho@fi.upm.es, ocorcho@localidata.com
@ocorcho
http://www.slideshare.net/ocorcho/