SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
BIG DATA EUROPE PLATFORM
3rd BDE Workshop for Energy in WindEurope Conference in Amsterdam28 November 2017
Architecture, Components,
How to Use, and Use Cases
Ivan Ermilov
University of Leipzig/InfAI
iermilov@informatik.uni-leipzig.de
Talk outline
 The Big Data Integrator platform
o Technical Architecture
o Components & Interfaces
 How To: Installation and Usage
 Use Cases
2
BigDataEurope: Summary
Horizon2020 project
17 partners
7 pilots in various domains
> 30 Big Data components
> 350 stars on github
3
Big Data Integrator
Platform Goals
 Open source
 Simple to get started with Big Data
 Support a variety of use cases
 Embrace emerging Big Data technologies
 Simple integration with custom components
5
6
Architecture
 Big Data Integrator (BDI):
o The prototype developed by BDE
 Main points of the architecture
o Dockerization
o Support layer, including integrated UI
o Semantification layer
7
Docker containers
 Docker offers lightweight virtualization
o Docker containers can be shared to be provisioned on different
Linux variations and versions
 Identical base sys
not required
 All BDI components:
Docker containers
8
Platform Architecture
9
Platform Architecture
10
Platform Architecture
11
Example: Distributed Triplestore
09 October 2017
 docker-compose up
 docker stack deploy
13
Example: Distributed Triplestore
09 October 2017
 Demo
BDI components
 Processing and storage components
o Re-used existing Docker containers where available
o Dockerized by BDE where not
o Ensured all can be provisioned through Docker Swarm
 Components by BDE:
o Support Layer
o Semantic Layer
14
BDE Docker Containers
15
Extraction & Processing
16
Apache Flume Collection and aggregation of data streams
Apache Kafka Distributed streaming platform (i.e. messaging)
Apache Spark MapReduce framework
Apache Flink MapReduce framework
Apache Hadoop MapReduce framework
Fox RDF extraction from text
Silk Link discovery
LIMES Link discovery
Data Storage
17
Apache Hadoop Distributed raw data storage
Apache Hive SQL over Hadoop
Apache HBase Distributed SQL store
Postgresql SQL database
MongoDB NoSQL Document database
Apache Cassandra Distributed database
Virtuoso Triplestore
Strabon Spatiotemporal triplestore
Ontario Semantic Data Lake
4store Triplestore
Data Indexing & Coordination
18
Apache Solr Data index based on Apache Lucene
Elasticsearch Distributed search and analytics engine
Apache Zookeeper Distributed key/value store
Semagrow SPARQL federation engine
Visualization & User Interfaces
19
Apache Zeppelin Interactive notebooks for Spark
Spark Notebook Interactive notebooks for Spark
Cloudera Hue (FB) Web based filebrowser for HDFS
BDI IDE BDE stack of interfaces for Swarm UI, application interfaces etc.
Kibana Visualization for Elasticsearch
Big Data Stack
 Stack
o Collection of components
o Solving specific problem
 One (or several) docker-compose files
o Include all component configuration (12 factor apps)
o Communications and application topology
09 October 2017
Example: Interactive Spark
 Demo
09 October 2017
BDI Stack Lifecycle
App templates
Ready-made components,
best practices
BDI Stack
Builder
BDI Support
Components,
Instructions
Swarm UI,
docker-compose
Pipeline/Logging
Monitor
22
Platform installation
 Manual installation guide
Using Docker Machine
o On local machine (VirtualBox)
o In cloud (AWS, DigitalOcean, Azure)
o Bare metal
Screencasts
23
BigDataEurope Use Cases24
SC1: Pharmacology research
Life
Sciences
& Health
• Extensive toolset
developed by
OPF and others
• Query a large number of datasets, some large
• Existing elaborate ingestion and homogenization
by the OpenPHACTS Foundation
25
SC1 Pilot: Points Demonstrated
Life
Sciences
& Health
• Existing distributed, scalable solution
• Based on Virtuoso
• Proprietary distributed triple store
• Porting to BDI gives flexibility
• Using Virtuoso or open source alternatives (in BDE,
4store) without development effort for the
superstructure and tools around it
26
SC2: Viticulture resources
Food and
Agriculture
• AgInfra is a major infrastructure for agriculture
researchers, serving cross-linked bibliography,
data, and processing services
• Pilot automates
ingestion and
thematic
classification of
publication full
texts
27
SC2 Pilot: Points Demonstrated
Food and
Agriculture
• AgInfra: Existing infrastructure for data,
metadata, and related services
• BDI is deployed as an external infrastructure
for processing text (viticulture publications)
• Allows storing and processing text at a larger
scale than AgInfra can currently manage
• The bibliographic metadata is added to
AgInfra
28
SC3: Predictive maintenance
Energy
• Wind turbine monitoring
applies computational
models to sensor data
streams
• Models are weekly re-
parameterized using
week’s data from multiple
turbines
29
SC3 Pilot: Points Demonstrated
Energy
• Existing in-house non-scalable solution for
model parameterization
• Reliable Fortran software for data analysis
• Efficient, but not scalable to data volume
• Developing a BDI orchestrator
• Re-uses existing software unmodified
• Makes it easy to apply in parallel to many
datasets and manage the outputs
30
SC4: Traffic conditions estimation
Transport
• Estimation of real-time traffic
conditions in Thessaloniki
• Combines:
• Traffic modelling from
historical data
• Current measurements from a
taxi fleet of 1200 vehicles
31
SC4 Pilot: Points Demonstrated
Transport
• New Flink implementations of map matching
and traffic prediction algorithms
• BDI provides access to varied data sources
• PostGIS database with city map
• ElasticSearch database of historical data
• Kafka stream of real-time data
32
SC4: Architecture
09 October 2017
23
SC5: Climate modelling
Climate
• Discovering and re-using previously
computed derivatives
• Lineage annotation: datasets and model
parameters used to compute derivative
datasets
• Finding appropriate past runs avoids
repeating weeks-long modelling runs
• Preparing modelling experiments
• Slicing, transforming, combining datasets into new datasets
• Submission to and retrieval from modelling infrastructure
34
SC5 Pilot: Points Demonstrated
Climate
• Existing infrastructure and stable, reliable
software for parallel computation of models
• BDI is deployed as an external infrastructure
for preparing and managing datasets
• BDI offers:
• Hive for managing data in a way that can be
retrieved and manipulated, rather than file blocks
• Cassandra stores structured and textual metadata
for searching headers and lineage
35
SC6: Municipality budgets
Social
Sciences
• Ingestion of budget and
budget execution data
• Multiple municipalities in
varied formats and data
models
• Homogenized data made
available for analysis and
comparison
36
SC6 Pilot: Points Demonstrated
Social
Sciences
• Existing analytics and visualization tools
• Use SPARQL queries to retrieve only the relevant
slices of the overall data
• BDI is deployed as an ingestion and storage
infrastructure for external tools
• Ingests and homogenizes a constant flow of JSON,
CSV, XML, and other formats following various
data models
• Exposes data as SPARQL endpoint serving
homogenized data, stored in 4store, a scalable,
distributed RDF store
37
SC6: Architecture
09 October 2017
25
SC7: Change detection & verification
Secure
Societies
• Events are extracted from text
published by news agencies
and on social networking sites
• Events are geo-located and
relevant changes are detected
by comparing current and
previous satellite images
39
SC7 Pilot: Points Demonstrated
Secure
Societies
• Re-implementation of change detection
algorithms for Spark
• Parallel orchestrator for text analytics
• Re-uses existing software
• Scales to many input streams
• BDI provides:
• Cassandra for text content and metadata
• Strabon GIS store for detected change location
• Homogeneous access to both for analysis and
visualization
40
SC7: Architecture
09 October 2017
27
Closing Remarks42
BDI use cases demonstrated
 Flexibility to existing workflows
o Drug discovery platform ported to new data storage infrastructure
 Scaling out
o Sensor data analysis software trivially distributed to multiple streams
o Image analysis and traffic pattern algorithms in Spark and Flink
 Preprocessing and ingestion
o Distributed data architectures pre-process and maintain big data
o Not needed simultaneously or reduced to small data
43
Semantic Web and Big Data
 Adding triple stores to the Apache ecosystem
o Scalability for Semantic Web technologies
o Semantics stores for the Apache ecosystem
 Advocating RDF technologies for describing processing and data services
o The Semantic Data Lake vision builds an RDF layer over
Swagger/OAI
 Advocating RDF and SPARQL as the lingua franca for Big Data retrieval
o Semagrow federations of heterogeneous internal and external data
stores
44
Thank you for your attention! Questions?
 BigDataEurope action Web site:
https://www.big-data-europe.eu
 Big Data Integrator:
https://github.com/big-data-europe
 Semagrow:
http://semagrow.github.io
 SANSA:
http://sansa-stack.net
45

Contenu connexe

Tendances

Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 
SC5 Hangout2 pilot 1 description
SC5 Hangout2  pilot 1 descriptionSC5 Hangout2  pilot 1 description
SC5 Hangout2 pilot 1 descriptionBigData_Europe
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache ArrowWes McKinney
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Icebergkbajda
 
Intro elasticsearch taswarbhatti
Intro elasticsearch taswarbhattiIntro elasticsearch taswarbhatti
Intro elasticsearch taswarbhattiTaswar Bhatti
 
What's new in spark 2.0?
What's new in spark 2.0?What's new in spark 2.0?
What's new in spark 2.0?Örjan Lundberg
 
New Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastNew Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastDatabricks
 
Data Visualization Project Presentation
Data Visualization Project PresentationData Visualization Project Presentation
Data Visualization Project PresentationShubham Shrivastava
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
Benchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metricsBenchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metricsRim Moussa
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials DataMarc Andersen
 
Ismis2014 dbaas expert
Ismis2014 dbaas expertIsmis2014 dbaas expert
Ismis2014 dbaas expertRim Moussa
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Deep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0’s OptimizerDeep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0’s OptimizerDatabricks
 

Tendances (20)

Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
ISNCC 2017
ISNCC 2017ISNCC 2017
ISNCC 2017
 
SC5 Hangout2 pilot 1 description
SC5 Hangout2  pilot 1 descriptionSC5 Hangout2  pilot 1 description
SC5 Hangout2 pilot 1 description
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Intro elasticsearch taswarbhatti
Intro elasticsearch taswarbhattiIntro elasticsearch taswarbhatti
Intro elasticsearch taswarbhatti
 
What's new in spark 2.0?
What's new in spark 2.0?What's new in spark 2.0?
What's new in spark 2.0?
 
New Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastNew Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit East
 
Data Visualization Project Presentation
Data Visualization Project PresentationData Visualization Project Presentation
Data Visualization Project Presentation
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Benchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metricsBenchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metrics
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
 
Multidimensional Scientific Data in ArcGIS
Multidimensional Scientific Data in ArcGISMultidimensional Scientific Data in ArcGIS
Multidimensional Scientific Data in ArcGIS
 
Ismis2014 dbaas expert
Ismis2014 dbaas expertIsmis2014 dbaas expert
Ismis2014 dbaas expert
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
HDF Update 2016
HDF Update 2016HDF Update 2016
HDF Update 2016
 
NASA Terra Data Fusion
NASA Terra Data FusionNASA Terra Data Fusion
NASA Terra Data Fusion
 
Deep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0’s OptimizerDeep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
 

Similaire à BDE SC3.3 Workshop - BDE Platform: Technical overview

BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigData_Europe
 
SC1 Workshop 2 General Introduction to BDE
SC1 Workshop 2 General Introduction to BDESC1 Workshop 2 General Introduction to BDE
SC1 Workshop 2 General Introduction to BDEBigData_Europe
 
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...BigData_Europe
 
BDE SC6 workshop - introduction 2016
BDE SC6 workshop - introduction 2016BDE SC6 workshop - introduction 2016
BDE SC6 workshop - introduction 2016BigData_Europe
 
SC7 Workshop 2: Big Data Technologies and Scenarios
SC7 Workshop 2: Big Data Technologies and ScenariosSC7 Workshop 2: Big Data Technologies and Scenarios
SC7 Workshop 2: Big Data Technologies and ScenariosBigData_Europe
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Ivan Ermilov
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...DataWorks Summit
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysKoneksys
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 
CPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementCPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementStephan Haller
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challengesvty
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...DataWorks Summit
 
Current & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylightCurrent & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylightabhijit2511
 

Similaire à BDE SC3.3 Workshop - BDE Platform: Technical overview (20)

BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal Pilots
 
SC1 Workshop 2 General Introduction to BDE
SC1 Workshop 2 General Introduction to BDESC1 Workshop 2 General Introduction to BDE
SC1 Workshop 2 General Introduction to BDE
 
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
 
BDE SC6 workshop - introduction 2016
BDE SC6 workshop - introduction 2016BDE SC6 workshop - introduction 2016
BDE SC6 workshop - introduction 2016
 
SC7 Workshop 2: Big Data Technologies and Scenarios
SC7 Workshop 2: Big Data Technologies and ScenariosSC7 Workshop 2: Big Data Technologies and Scenarios
SC7 Workshop 2: Big Data Technologies and Scenarios
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By Koneksys
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Benchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systemsBenchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systems
 
CPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementCPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data Management
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
 
NextGenML
NextGenML NextGenML
NextGenML
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Current & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylightCurrent & Future Use-Cases of OpenDaylight
Current & Future Use-Cases of OpenDaylight
 

Plus de BigData_Europe

Luigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator PlatformLuigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator PlatformBigData_Europe
 
Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4BigData_Europe
 
Rajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectRajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectBigData_Europe
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...BigData_Europe
 
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...BigData_Europe
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...BigData_Europe
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...BigData_Europe
 
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
 BDE SC3.3 Workshop -  BDE review: Scope and Opportunities BDE SC3.3 Workshop -  BDE review: Scope and Opportunities
BDE SC3.3 Workshop - BDE review: Scope and OpportunitiesBigData_Europe
 
BDE SC3.3 Workshop - Agenda
 BDE SC3.3 Workshop - Agenda BDE SC3.3 Workshop - Agenda
BDE SC3.3 Workshop - AgendaBigData_Europe
 
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
 BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re... BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...BigData_Europe
 
BDE SC3.3 Workshop - Data management in WT testing and monitoring
 BDE SC3.3 Workshop - Data management in WT testing and monitoring  BDE SC3.3 Workshop - Data management in WT testing and monitoring
BDE SC3.3 Workshop - Data management in WT testing and monitoring BigData_Europe
 
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
 BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition MonitoringBigData_Europe
 
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BigData_Europe
 
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
 BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics  BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics BigData_Europe
 
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...BigData_Europe
 
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)BigData_Europe
 
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)BigData_Europe
 
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)BigData_Europe
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe
 

Plus de BigData_Europe (20)

Luigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator PlatformLuigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator Platform
 
Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4
 
Rajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectRajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO Project
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
 
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
 
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
 BDE SC3.3 Workshop -  BDE review: Scope and Opportunities BDE SC3.3 Workshop -  BDE review: Scope and Opportunities
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
 
BDE SC3.3 Workshop - Agenda
 BDE SC3.3 Workshop - Agenda BDE SC3.3 Workshop - Agenda
BDE SC3.3 Workshop - Agenda
 
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
 BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re... BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
 
BDE SC3.3 Workshop - Data management in WT testing and monitoring
 BDE SC3.3 Workshop - Data management in WT testing and monitoring  BDE SC3.3 Workshop - Data management in WT testing and monitoring
BDE SC3.3 Workshop - Data management in WT testing and monitoring
 
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
 BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
 
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
 
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
 BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics  BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
 
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
 
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
 
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
 
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
 

Dernier

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Dernier (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

BDE SC3.3 Workshop - BDE Platform: Technical overview

  • 1. BIG DATA EUROPE PLATFORM 3rd BDE Workshop for Energy in WindEurope Conference in Amsterdam28 November 2017 Architecture, Components, How to Use, and Use Cases Ivan Ermilov University of Leipzig/InfAI iermilov@informatik.uni-leipzig.de
  • 2. Talk outline  The Big Data Integrator platform o Technical Architecture o Components & Interfaces  How To: Installation and Usage  Use Cases 2
  • 3. BigDataEurope: Summary Horizon2020 project 17 partners 7 pilots in various domains > 30 Big Data components > 350 stars on github 3
  • 5. Platform Goals  Open source  Simple to get started with Big Data  Support a variety of use cases  Embrace emerging Big Data technologies  Simple integration with custom components 5
  • 6. 6
  • 7. Architecture  Big Data Integrator (BDI): o The prototype developed by BDE  Main points of the architecture o Dockerization o Support layer, including integrated UI o Semantification layer 7
  • 8. Docker containers  Docker offers lightweight virtualization o Docker containers can be shared to be provisioned on different Linux variations and versions  Identical base sys not required  All BDI components: Docker containers 8
  • 12. Example: Distributed Triplestore 09 October 2017  docker-compose up  docker stack deploy 13
  • 13. Example: Distributed Triplestore 09 October 2017  Demo
  • 14. BDI components  Processing and storage components o Re-used existing Docker containers where available o Dockerized by BDE where not o Ensured all can be provisioned through Docker Swarm  Components by BDE: o Support Layer o Semantic Layer 14
  • 16. Extraction & Processing 16 Apache Flume Collection and aggregation of data streams Apache Kafka Distributed streaming platform (i.e. messaging) Apache Spark MapReduce framework Apache Flink MapReduce framework Apache Hadoop MapReduce framework Fox RDF extraction from text Silk Link discovery LIMES Link discovery
  • 17. Data Storage 17 Apache Hadoop Distributed raw data storage Apache Hive SQL over Hadoop Apache HBase Distributed SQL store Postgresql SQL database MongoDB NoSQL Document database Apache Cassandra Distributed database Virtuoso Triplestore Strabon Spatiotemporal triplestore Ontario Semantic Data Lake 4store Triplestore
  • 18. Data Indexing & Coordination 18 Apache Solr Data index based on Apache Lucene Elasticsearch Distributed search and analytics engine Apache Zookeeper Distributed key/value store Semagrow SPARQL federation engine
  • 19. Visualization & User Interfaces 19 Apache Zeppelin Interactive notebooks for Spark Spark Notebook Interactive notebooks for Spark Cloudera Hue (FB) Web based filebrowser for HDFS BDI IDE BDE stack of interfaces for Swarm UI, application interfaces etc. Kibana Visualization for Elasticsearch
  • 20. Big Data Stack  Stack o Collection of components o Solving specific problem  One (or several) docker-compose files o Include all component configuration (12 factor apps) o Communications and application topology 09 October 2017
  • 21. Example: Interactive Spark  Demo 09 October 2017
  • 22. BDI Stack Lifecycle App templates Ready-made components, best practices BDI Stack Builder BDI Support Components, Instructions Swarm UI, docker-compose Pipeline/Logging Monitor 22
  • 23. Platform installation  Manual installation guide Using Docker Machine o On local machine (VirtualBox) o In cloud (AWS, DigitalOcean, Azure) o Bare metal Screencasts 23
  • 25. SC1: Pharmacology research Life Sciences & Health • Extensive toolset developed by OPF and others • Query a large number of datasets, some large • Existing elaborate ingestion and homogenization by the OpenPHACTS Foundation 25
  • 26. SC1 Pilot: Points Demonstrated Life Sciences & Health • Existing distributed, scalable solution • Based on Virtuoso • Proprietary distributed triple store • Porting to BDI gives flexibility • Using Virtuoso or open source alternatives (in BDE, 4store) without development effort for the superstructure and tools around it 26
  • 27. SC2: Viticulture resources Food and Agriculture • AgInfra is a major infrastructure for agriculture researchers, serving cross-linked bibliography, data, and processing services • Pilot automates ingestion and thematic classification of publication full texts 27
  • 28. SC2 Pilot: Points Demonstrated Food and Agriculture • AgInfra: Existing infrastructure for data, metadata, and related services • BDI is deployed as an external infrastructure for processing text (viticulture publications) • Allows storing and processing text at a larger scale than AgInfra can currently manage • The bibliographic metadata is added to AgInfra 28
  • 29. SC3: Predictive maintenance Energy • Wind turbine monitoring applies computational models to sensor data streams • Models are weekly re- parameterized using week’s data from multiple turbines 29
  • 30. SC3 Pilot: Points Demonstrated Energy • Existing in-house non-scalable solution for model parameterization • Reliable Fortran software for data analysis • Efficient, but not scalable to data volume • Developing a BDI orchestrator • Re-uses existing software unmodified • Makes it easy to apply in parallel to many datasets and manage the outputs 30
  • 31. SC4: Traffic conditions estimation Transport • Estimation of real-time traffic conditions in Thessaloniki • Combines: • Traffic modelling from historical data • Current measurements from a taxi fleet of 1200 vehicles 31
  • 32. SC4 Pilot: Points Demonstrated Transport • New Flink implementations of map matching and traffic prediction algorithms • BDI provides access to varied data sources • PostGIS database with city map • ElasticSearch database of historical data • Kafka stream of real-time data 32
  • 34. SC5: Climate modelling Climate • Discovering and re-using previously computed derivatives • Lineage annotation: datasets and model parameters used to compute derivative datasets • Finding appropriate past runs avoids repeating weeks-long modelling runs • Preparing modelling experiments • Slicing, transforming, combining datasets into new datasets • Submission to and retrieval from modelling infrastructure 34
  • 35. SC5 Pilot: Points Demonstrated Climate • Existing infrastructure and stable, reliable software for parallel computation of models • BDI is deployed as an external infrastructure for preparing and managing datasets • BDI offers: • Hive for managing data in a way that can be retrieved and manipulated, rather than file blocks • Cassandra stores structured and textual metadata for searching headers and lineage 35
  • 36. SC6: Municipality budgets Social Sciences • Ingestion of budget and budget execution data • Multiple municipalities in varied formats and data models • Homogenized data made available for analysis and comparison 36
  • 37. SC6 Pilot: Points Demonstrated Social Sciences • Existing analytics and visualization tools • Use SPARQL queries to retrieve only the relevant slices of the overall data • BDI is deployed as an ingestion and storage infrastructure for external tools • Ingests and homogenizes a constant flow of JSON, CSV, XML, and other formats following various data models • Exposes data as SPARQL endpoint serving homogenized data, stored in 4store, a scalable, distributed RDF store 37
  • 39. SC7: Change detection & verification Secure Societies • Events are extracted from text published by news agencies and on social networking sites • Events are geo-located and relevant changes are detected by comparing current and previous satellite images 39
  • 40. SC7 Pilot: Points Demonstrated Secure Societies • Re-implementation of change detection algorithms for Spark • Parallel orchestrator for text analytics • Re-uses existing software • Scales to many input streams • BDI provides: • Cassandra for text content and metadata • Strabon GIS store for detected change location • Homogeneous access to both for analysis and visualization 40
  • 43. BDI use cases demonstrated  Flexibility to existing workflows o Drug discovery platform ported to new data storage infrastructure  Scaling out o Sensor data analysis software trivially distributed to multiple streams o Image analysis and traffic pattern algorithms in Spark and Flink  Preprocessing and ingestion o Distributed data architectures pre-process and maintain big data o Not needed simultaneously or reduced to small data 43
  • 44. Semantic Web and Big Data  Adding triple stores to the Apache ecosystem o Scalability for Semantic Web technologies o Semantics stores for the Apache ecosystem  Advocating RDF technologies for describing processing and data services o The Semantic Data Lake vision builds an RDF layer over Swagger/OAI  Advocating RDF and SPARQL as the lingua franca for Big Data retrieval o Semagrow federations of heterogeneous internal and external data stores 44
  • 45. Thank you for your attention! Questions?  BigDataEurope action Web site: https://www.big-data-europe.eu  Big Data Integrator: https://github.com/big-data-europe  Semagrow: http://semagrow.github.io  SANSA: http://sansa-stack.net 45