SlideShare une entreprise Scribd logo
1  sur  82
Télécharger pour lire hors ligne
EUROPEAN BIG DATA VALUE FORUM 2020
Session 2.
A Project perspective on Big
Data architectural pipelines
and benchmarks
EUROPEAN BIG DATA VALUE FORUM 2020
Panelists
Leonidas Kallipolitis
Aegis - I-BiDaaS Projects
Brian Elvesæter
Research scientist, SINTEF
TheyBuyForYou Project
Caj SÖDERGÅRD
Professor, VTT Data-driven
Solutions
DataBio Project
Athanasios Koumparos
Senior Software Engineer,
Vodafone Innovus
Track&Know Project
Arne Berre
Chief Scientist, SINTEF
Jon Ander Gómez Adrián
Universitat Politecnica de Valencia
DeepHealth Project
Introduction to Architectural pipelines
• I-BiDaaS
• TBFY
• Track&Know
• DataBio
• DeepHealth
Data
Analytics/ML
(including data
processing for analysis,
AI and Machine
Learning)
Data Visualisation,
Action/Interaction
(including data presentation
environment/boundary/user
action and interaction)
Data
Acquisition/Collection
(including Data Ingestion
processing, Streaming, Data
Extraction, Ingestion Storage,
Different data types)
Data
Storage/Preparation
(including Storage
Retrieval/Access/Queries
Data Protection, Curation,
Integration, Publication)
Pipeline steps
Data types
Batch,
Real time,
Interactive
Processing types
- Benchmarks
- Tools/Components
- Standards
Trans-continuum
(Edge to Cloud)
IoT – Real time Data Pipeline
Knowledge Graph/Ontology/Linked Data pipeline
Realtime
Batch
Interactive
Realtime
6 data types
Conclusion on Pipelines and related benchmarks,
Arne J. Berre, SINTEF
• I-BiDaaS
• TBFY
• Track&Know
• DataBio
• DeepHealth
EUROPEAN BIG DATA VALUE FORUM 2020
Industrial-Driven
Big Data as a Self-
Service Solution
Leonidas Kallipolitis, AEGIS
EUROPEAN BIG DATA VALUE FORUM 2020EUROPEAN BIG DATA VALUE FORUM 2020
Identity
Card
EUROPEAN BIG DATA VALUE FORUM 2020
1. FOUNDATION FOR RESEARCH AND TECHNOLOGY HELLAS (FORTH)
2. BARCELONA SUPERCOMPUTING CENTER - CENTRO NACIONAL DE
SUPERCOMPUTACION (BSC)
3. IBM ISRAEL - SCIENCE AND TECHNOLOGY LTD (IBM)
4. CENTRO RICERCHE FIAT SCPA (CRF)
5. SOFTWARE AG (SAG)
6. CAIXABANK, S.A (CAIXA)
7. THE UNIVERSITY OF MANCHESTER (UNIMAN)
8. ECOLE NATIONALE DES PONTS ET CHAUSSEES (ENPC)
9. ATOS SPAIN SA (ATOS)
10. AEGIS IT RESEARCH LTD (AEGIS)
11. INFORMATION TECHNOLOGY FOR MARKET LEADERSHIP (ITML)
12. UNIVERSITY OF NOVI SAD FACULTY OF SCIENCES SERBIA
(UNSPMF)
13. TELEFONICA INVESTIGACION Y DESARROLLO SA (TID)
Consortium
EUROPEAN BIG DATA VALUE FORUM 2020
Key messages
Cope with the rate of data
asset growth
I-BiDaaS pipeline
• Expert mode
Analyze your DataUsers
• Import your data
• Self-service mode
Data
• Fabricate Data
• Stream & Batch Analytics
Results
• Visualize the results
• Share models
Do it yourself
In a flexible
manner
Break data silos Safe environment Interact with Big Data
technologies
Increase speed of
data analysis
Intra- and inter-
domain data-flow
Benefits of using I-BiDaaS
• Co-develop mode • Tokenize data
• Expert: Upload your code
• Self-service: Select an
algorithm from the pool
• Co-develop: custom end-to-
end application
• Expert mode
Analyze your DataUsers
• Import your data
• Self-service mode
Data
• Fabricate Data
• Stream & Batch Analytics
• Expert: Upload your code
• Self-service: Select an
algorithm from the pool
Results
• Visualize the results
• Share models
Do it yourself
In a flexible
manner
Break data silos Safe environment Interact with Big Data
technologies
Increase speed of
data analysis
Cope with the rate of data
asset growth
Intra- and inter-
domain data-flow
Benefits of using I-BiDaaS
• Co-develop mode
• Co-develop: custom end-
to-end application
• Tokenize dataFlexible solution
I-BiDaaS pipeline
• Expert mode
Analyze your DataUsers
• Import your data
• Self-service mode
Data
• Fabricate Data
• Stream & Batch Analytics
• Expert: Upload your code
• Self-service: Select an
algorithm from the pool
Results
• Visualize the results
• Share models
Do it yourself
In a flexible
manner
Break data silos Safe environment Interact with Big Data
technologies
Increase speed of
data analysis
Cope with the rate of data
asset growth
Intra- and inter-
domain data-flow
Benefits of using I-BiDaaS
• Co-develop mode
• Co-develop: custom end-
to-end application
• Tokenize data
Data sharing
& breaking silos
I-BiDaaS pipeline
• Expert mode
Analyze your DataUsers
• Import your data
• Self-service mode
Data
• Fabricate Data
• Stream & Batch Analytics
Results
• Visualize the results
• Share models
Do it yourself
In a flexible
manner
Break data silos Safe environment Interact with Big Data
technologies
Increase speed of
data analysis
Cope with the rate of data
asset growth
Intra- and inter-
domain data-flow
Benefits of using I-BiDaaS
• Co-develop mode • Tokenize data
I-BiDaaS pipeline
• Expert: Upload your code
• Self-service: Select an
algorithm from the pool
• Co-develop: custom end-to-
end application
• Expert mode
Analyze your DataUsers
• Import your data
• Self-service mode
Data
• Fabricate Data
• Stream & Batch Analytics
• Expert: Upload your code
• Self-service: Select an
algorithm from the pool
Results
• Visualize the results
• Share models
Do it yourself
In a flexible
manner
Break data silos Safe environment Interact with Big Data
technologies
Increase speed of
data analysis
Cope with the rate of data
asset growth
Intra- and inter-
domain data-flow
Benefits of using I-BiDaaS
• Co-develop mode
• Co-develop: custom end-to-
end application
• Tokenize data
I-BiDaaS pipeline
Project
setup
Data
Selection
Data
preparation
Experiment
execution
Results
visualization
I-BiDaaS Experimental Workflow
Data
Analytics/ML
(including data
processing for analysis,
AI and Machine
Learning)
Data Visualisation,
Action/Interaction
(including data presentation
environment/boundary/user
action and interaction)
Data
Acquisition/Collection
(including Data Ingestion
processing, Streaming, Data
Extraction, Ingestion Storage,
Different data types)
Data
Storage/Preparation
(including Storage
Retrieval/Access/Queries
Data Protection, Curation,
Integration, Publication)
DataBench Pipeline
Project
setup
Data
Selection
Data
preparation
Experiment
execution
Results
visualization
Example: Banking Experiments Workflow
Batch & Streaming Data
Identify data collected by
the different sources
(ATMs, online banking
services, employees’
workstations, external
providers’ activity,
network devices, etc.),
stored in CAIXA datapool
Data Fabrication
Data Integration
Data Anonymization
Data Encryption
Clustering
Graph Based Analysis
Static charts for batch
analysis results
Real-time interactive
graphics for analysis
results of constantly
incoming streaming data
DataBench Pipeline
EUROPEAN BIG DATA VALUE FORUM 2020EUROPEAN BIG DATA VALUE FORUM 2020
Medium to long term business decisions
Data
Fabrication
Platform
(IBM)
Refined
specifications
for data
fabrication
GPU-accelerated Analytics
(FORTH)
Apama Complex Event
Processing (SAG)
Streaming Analytics
Batch Processing
Advanced ML (UNSPMF)
COMPsProgramming
Model(BSC)
Query Partitioning
Infrastructurelayer: Private cloud; Commodity cluster; GPUs
Pre-defined
Queries
SQL-like
interface
Domain
Language
Programming
API
User Interface
Resource managementand orchestration (ATOS)
Advanced
Visualis.
Advanced IT services for
Big Data processing tasks;
Open source pool of ML
algorithms
Data ingestion
Programming
Interface /
Sequential
Programming
(AEGIS+SAG)
(AEGIS)
COMPsRuntime
(BSC)
Distributed
large scale
layer
Application layer
UniversalMessaging(SAG)
TestData
Fabrication(IBM)Meta-
data;
Data
descri-
ption
Hecuba tools
(BSC)
Short term decisions
real time alerts
Model structure improvements
Learnedpatterns correlations
The I-
BiDaaS
solution:
Architecture
&
technologie
EUROPEAN BIG DATA VALUE FORUM 2020EUROPEAN BIG DATA VALUE FORUM 2020
WP2
Data, user
interface,
visualizationMedium to long term business decisions
Data
Fabrication
Platform
(IBM)
Refined
specifications
for data
fabrication
GPU-accelerated Analytics
(FORTH)
Apama Complex Event
Processing (SAG)
Streaming Analytics
Batch Processing
Advanced ML (UNSPMF)
COMPs Programming
Model (BSC)
Query Partitioning
Infrastructure layer: Private cloud; Commodity cluster; GPUs
Pre-defined
Queries
SQL-like
interface
Domain
Language
Programming
API
User Interface
Resource management and orchestration (ATOS)
Advanced
Visualis.
Advanced IT services for
Big Data processing tasks;
Open source pool of ML
algorithms
Programming
Interface /
Sequential
Programming
(AEGIS+SAG)
(AEGIS)
COMPs Runtime
(BSC)
Distributed
large scale
layer
Application layer
UniversalMessaging(SAG)
DataFabrication
Platform(IBM)
Meta-
data;
Data
descri-
ption
Hecuba tools
(BSC)
Short term decisions
real time alerts
Model structure improvements
Learned patterns correlations
Technologies:
• IBM TDF
• SAG UM
• AEGIS AVT
http://ibidaas.eu/tools
EUROPEAN BIG DATA VALUE FORUM 2020EUROPEAN BIG DATA VALUE FORUM 2020
WP3
Batch
Analytics
Technologies:
• BSC COMPSs
• BSC Hecuba
• BSC Qbeast
• Advanced ML
(UNSPMF)
http://ibidaas.eu/tools
Medium to long term business decisions
Data
Fabrication
Platform
(IBM)
Refined
specifications
for data
fabrication
GPU-accelerated Analytics
(FORTH)
Apama Complex Event
Processing (SAG)
Streaming Analytics
Batch Processing
Advanced ML (UNSPMF)
COMPs Programming
Model (BSC)
Query Partitioning
Infrastructure layer: Private cloud; Commodity cluster; GPUs
Pre-defined
Queries
SQL-like
interface
Domain
Language
Programming
API
User Interface
Resource management and orchestration (ATOS)
Advanced
Visualis.
Advanced IT services for
Big Data processing tasks;
Open source pool of ML
algorithms
Programming
Interface /
Sequential
Programming
(AEGIS+SAG)
(AEGIS)
COMPs Runtime
(BSC)
Distributed
large scale
layer
Application layer
UniversalMessaging(SAG)
DataFabrication
Platform(IBM)
Meta-
data;
Data
descri-
ption
Hecuba tools
(BSC)
Short term decisions
real time alerts
Model structure improvements
Learned patterns correlations
Benchmarking: Technology level
I-BiDaaS
partner
Technology name Platform role
DataBench
pipeline
Current benchmarks
FORTH GPU accelerator technology Data pre-processing, Streaming Analytics Step 3 Custom benchmark (throughput, latency)
BSC COMPSs Sequential programming model for
distributed architectures
Step 3 Applications (Own use cases)
BSC Hecuba Data management framework with easy
interface
Step 2 Applications (Own use cases)
BSC Qbeast Multidimensional indexing and storage Step 2 TPC-H
IBM Test Data Fabrication Synthetic test data fabrication Step 1 Several open source + commercial
products (e.g., Grid tools of CA) / No
known benchmarks yet
SAG Apama Streaming Analytics
Platform
Streaming Analytics Step 3 Custom benchmark (throughput)
SAG Universal Messaging Message Broker Step 1 Custom benchmark (throughput)
AEGIS Advanced visualization and
monitoring
Visualization and interface Step 4 N/A
UNSPMF Pool of ML algorithms in
COMPSs/Python
Batch analytics Step 3 Respective MPI implementation;
Sklearn; HiBench
ATOS Resource management and
orchestration module
Resource management Step 2 N/A
Benchmarking: Business, data & analytics
levelBusiness Objectives Data Sets Data Size Processing Type Type of Analysis
Telecoms
⁻ improve and optimize
current operations
⁻ Anonymized mobility data
(structured)
⁻ Anonymized call center data
(unstructured)
TB ⁻ batch & streaming ⁻ predictive
⁻ descriptive /
diagnostic
Finance
⁻ improve decision
making
⁻ improve efficiency of
Big Data solutions
⁻ Tokenized online banking
control data (structured)
⁻ Tokenized bank transfer data
(structured)
⁻ Tokenized IP address data
(structured)
PB ⁻ batch
⁻ batch & streaming
⁻ descriptive /
diagnostic
Manufacturing
⁻ improve and optimise
current operations
⁻ improve the quality of
the process and product
⁻ Anonymized SCADA/MES data
(structured)
⁻ Anonymized Aluminum Die-
casting (structured)
GB ⁻ batch
⁻ batch & streaming
⁻ predictive
⁻ diagnostic
Benchmarking: Business level
I-BiDaaS
Partner
Use Case Most relevant business KPIs
TID Accurate location prediction with high traffic and visibility - Acquisition of insights on the dynamics of
cellular sectors
- Processing costs (cost reduction)
- Customer satisfaction
TID Optimization of placement of telecommunication equipment
TID Quality of service in Call Centers
CAIXA Enhanced control on online banking - Cost reduction
- Data accessibility
- Time efficiency
- End-to-end execution time (from data request to
data provision)
CAIXA Advanced analysis of bank transfer payment in financial terminal
CAIXA Analysis of relationships through IP addresses
CRF Production process of aluminium die-casting - 3 Product quality levels (High, Medium, Low)
- Overall Equipment Effectiveness (OEE),
- Maintenance cost
- Cost reduction
CRF Maintenance and monitoring of production assets
EUROPEAN BIG DATA VALUE FORUM 2020EUROPEAN BIG DATA VALUE FORUM 2020
Thank you!
Questions?
Enabling procurement data value chains for
economic development, demand management,
competitive markets and vendor intelligence
Brian Elvesæter, SINTEF
brian.elvesater@sintef.no
29© by the TBFY-consortium, 2020
https://theybuyforyou.eu
TheyBuyForYou (TBFY)
• Horizon 2020 Innovation Action
• Grant agreement No 780247
• Duration: 36 months
• Jan 2018 – Dec 2020
• Overall budget
• € 3 274 440
• Developing Big Data
tools for
Public Procurement
• data access
• data analytics
• data interaction
• data visualization
• 10 Partners
from 5 countries
30© by the TBFY-consortium, 2020
Partner Country Organisation Type
SINTEF AS NO RTO
Municipality of
Zaragoza
ES Public sector
CERVED IT Private Company
OpenCorporates UK Private Company
Josef Stefan Institute SI RTO
Ministry of Public
Sector Innovation
SI Public Sector
OESÍA Networks ES Private Company
OpenOpps UK Private Company
Universidad
Politécnica de Madrid
ES University
King's College London UK University
TBFY Knowledge Graph
© by the TBFY-consortium, 2020 31
OpenOpps
OpenCorporates
tender data
company data
RDF
Triple store
mapping and linking
Reconciliation
service
Ontology network
129 million triples
• more than 1.35 million tenders
• more than 1.58 million awards
• around 100 thousand companies
http://data.tbfy.eu
TBFY Platform
32© by the TBFY-consortium, 2020
Benchmark case #2:
Query/visualization
33© by the TBFY-consortium, 2020
• SPARQL Query Performance
• KG API (RESTful API wrapping pre-
defined SPARQL queries)
• Visualization
• Benchmark questions?
• Which triple store (RDF) database
to choose?
• Which cloud computing
(resource) plan to choose?
• https://aws.amazon.com/pricing/
• Database stability issues
• JVM heap size
• DataBench ToolBox
• LDBC Semantic Publishing
Benchmark (SPB) to benchmark
RDF triple stores
• GraphDB
• Apache Jena Fuseki & TDB
• …
Questions?
Brian Elvesæter, SINTEF
brian.elvesater@sintef.no
34© by the TBFY-consortium, 2020
https://theybuyforyou.eu
Vodafone Innovus
Fleet Management solutions
Vodafone Innovus
European Big Data Value Forum, 2020.11.04
Athanasios Koumparos
2
Getting to know Vodafone Innovus
World class capabilities build a world class eco-system
Who we are
a 100% Vodafone company located
in Athens, Greece, est. 2004
Operating in 7 Vodafone OpCos
What we do
We design and implement innovative IoT
solutions based on VF Group strategy &
customer needs
What we deliver
We have built Vodafone Group products as
well as the Global Device Management
solution
Our advantages
✓ World class Platform
✓ Global & local expertise
✓ Operations experience
✓ Agile way of working
Enterprise Fleet | Overview
3
Markets deployed
VF Greece & VF Albania
>13k active assets
>6k assets in cold chain
logistics
11 years of experience
providing an excellent
solution since 2009
Innovation
We participate in EU programs
(Track & Know, Trustonomy) and
collaborate with the Hellenic
Organization of Intelligent Transport
Systems to enhance the solution
capabilities
A fresh, award winning
approach
new platform with re-designed
UI, advanced Driver Safety &
enhanced capabilities
Logistics
Vertical
Soho / SME /
Corporate
segments
Developed by
VF Innovus
Refreshed
platform,
GDSP
connected
A well-established solution, applicable to customers with
advanced needs for location tracking, sensor monitoring & in-depth reports / analytics !
VFI IoT Platform
Enterprise Fleet
GSM Network
Low Level Device – Platform Communication
Online Data Processing
Analytics – Offline Processing
API, GUI
Devices on vehicles
Enterprise Fleet
Device in vehicles
• Devices are installed under the hood
(or bonnet)
• Vehicles vary in type, age and electronics
Data
• More than 3M data packets received
daily by the platform
• GPS data (coordinates, angle, speed,
date)
Challenges
Enterprise Fleet++
Reception problems
• Low signal strength due to installation (under the bonnet)
• Urban areas (bridges, tall buildings etc.)
• GPS signal reflections
Vehicle
• Poor quality of power supply
• Vehicle aging – noisy electricals
Issues caused by problems
Enterprise Fleet++
• Incorrect display position on map
(on top of buildings, fields, wrong road)
• False positive alerts (e.g. speed violation)
• Location gaps during route display
Track & Know
Enterprise Fleet++
• Big Data Analytics
• ML based components
• Scalable
• Online Processing
• Offline Processing
• Versatile
• Easy to integrate (Kafka Topics)
Data Collection / Collection
Live data streaming from
devices (GPS/Sensors)
Data Storage / Preparation Data Analytics / ML
Data Visualization Action /
Interaction
Visual analytics
End user GUI
Cleansing, enrichment,
storage in NoSQL stores
Pattern recognition
Location forecasting
Clustering
Mobility networks
Integration VFI IoT Platform to Track & Know
Enterprise Fleet++
Online Processing Flows:
• Data cleansing
• Data enrichment
• Data pattern recognition
• Future location prediction
• Driver behavior analysis
Offline Processing Flows
• Individual mobility networks, predict
next service period
• Hot spot analysis
• Trajectory matching
• Visualization
GSM Network
Low Level Device – Platform Communication
Online Processing
Analytics – Offline
Processing
API, GUI
Online
Offline
Data Flows T&K Platform
Enterprise Fleet++
Data cleansing tool - Map matching tool
Enterprise Fleet++
GPS post enhancements
Enterprise Fleet++
• Vehicle moving in dense urban area
• GPS reception not always good
• Vehicle moving on a highway
• Sampling rate at 30 seconds
Thank you
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
1
This project has received funding from
the European Union’s Horizon 2020
research and innovation programme
under grant agreement No 732064
This project is part
of BDV PPP
DATABIO PIPELINES
Prof. Dr. Caj Södergård,
Technical Manager of DataBio
VTT Technical Research Centre of Finland
European Big Data Forum 2020
DataBench session
4.11.2020
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
2
Data-driven Bioeconomy - DataBio
DataBio aim: Develop big data tools for enhancing production
of raw materials for food, energy and biomaterials industries
• A EU-funded project with 48 partners
• 27 bioeconomy pilots in 17 countries
• Duration 2017-2020
Responsible
and
sustainable
production
of food, energy
and biomaterials
Better raw
material
utilisation
from agriculture,
forestry and
fishery sources
New business
opportunities
through market-ready
big data technologies
Motivation
Population growth and urbanisation are
increasing the demand for natural
resources, which is putting a strain on
the Earth's carrying capacity.
We aim to develop new sustainable
ways to use forest, farm and fishery
resources and to communicate real-
time information to decision-makers
and producers.
2
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
3
DataBio platform serves the 27 pilots
Big Data Pipeline
8
Big Data Pipelines
- generic
- pilot-wise
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
4
DataBio platform serves the 27 pilots
8
Big Data Pipelines
- generic
- pilot-wise
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
5
DataBio platform serves the 27 pilots
Big Data Pipelines
- generic
- pilot-wise
8
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
6
Pipelines vs. Services
Pipeline
• A chain of real-time processing components:
• Acqusition/Collection,
• Preparation
• Analytics
• Visualisation, User interaction
• Clear interfaces between components and to outside
• A ”white box” showing internal wiring for developers
Service
• Provides usability to end users
• No display of internal wirings of components
• Accessed through API:s (web services, remote calls)
• Activated remotely through database queries (”end
points”) and executed in the cloud.
• Represents a ”black box”.
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
7
We used the BDVA Reference Model
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
8
Platform development in DataBio in numbers
• 62components from 28partners in two
trial rounds
• 1-6components per pilot (average 2)
• 14 new user interfaces
• 59 new APIs
• +2,7 in Technology Readiness Level (1-9)
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
9
Generic pipelines in DataBio
•
Generic pipeline Applied in pilots
Earth Observation and
Geospatial data
processing
A1.1 Precision agri B1.2 Precision agri , C1.1 Insurance,
C2.2 CAP Support
Fishery: A1 Tuna fishing operations , B1 Tuna fishing
planning,
IoT data real-time
processing and decision
making
Agri: A1.1 Precision agri, B1.1 Cereals & biomass crops
Fishery: A1
Linked Data Integration
and publication
Agri: B1.4 Precision agri B2.1 Machinery mgmt, Other
open farm and geospatial datasets
Fishery: Virtual pilot
Privacy-aware analytics Fishery (Norway): Virtual pilot
Genomics Agri: A2.1 Greenhouse , B1.3 Biomass sorghum
Forestry Data
management
Forestry (Finland): 2.2.1 Data Sharing , 2.2.2
Monitoring, 2.4.2 Shared Multiuser environment
Fisheries decision support
in catch planning
Fishery: B2, C1, C2, Virtual pilot
Source: Deliverable D4.4 Service documentation
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
10
Generic pipelines in DataBio
•
Generic pipeline Applied in pilots
Earth Observation and
Geospatial data
processing
A1.1 Precision agri B1.2 Precision agri , C1.1 Insurance,
C2.2 CAP Support
Fishery: A1 Tuna fishing operations , B1 Tuna fishing
planning,
IoT data real-time
processing and decision
making
Agri: A1.1 Precision agri, B1.1 Cereals & biomass crops
Fishery: A1
Linked Data Integration
and publication
Agri: B1.4 Precision agri B2.1 Machinery mgmt, Other
open farm and geospatial datasets
Fishery: Virtual pilot
Privacy-aware analytics Fishery (Norway): Virtual pilot
Genomics Agri: A2.1 Greenhouse , B1.3 Biomass sorghum
Forestry Data
management
Forestry (Finland): 2.2.1 Data Sharing , 2.2.2
Monitoring, 2.4.2 Shared Multiuser environment
Fisheries decision support
in catch planning
Fishery: B2, C1, C2, Virtual pilot
Source: Deliverable D4.4 Service documentation
Decision Making
Visualization
IoT devices
Real-time Data
Collection
Complex Event
Processing
Real-time Data
Preprocessing
events
complex events
complex
events
input
events events
Data Analytics
(including data
processingforanalysis,
AI andMachine
Learning)
Data Visualisation
and UserInteraction
(including Data
presentationand user
interaction)
Data
Acquisition/Collection
(including DataIngestion
processing,Storageand
Retrieval/Access/Queries,
Differentdata types)
Data Preparation
(including DataProtection,
Cleaning, Fusion,Integration,
Linking, Extraction,Curation,
Publication/Registration/,
Discovery)
Instance: Prediction and real-time
alerts of diseases and pests
breakouts in crops
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
12
Crop monitoring using ”IoT Generic Pipeline
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
13
Generic pipelines in DataBio
•
Generic pipeline Applied in pilots
Earth Observation and
Geospatial data
processing
A1.1 Precision agri B1.2 Precision agri , C1.1 Insurance,
C2.2 CAP Support
Fishery: A1 Tuna fishing operations , B1 Tuna fishing
planning,
IoT data real-time
processing and decision
making
Agri: A1.1 Precision agri, B1.1 Cereals & biomass crops
Fishery: A1
Linked Data Integration
and publication
Agri: B1.4 Precision agri B2.1 Machinery mgmt, Other
open farm and geospatial datasets
Fishery: Virtual pilot
Privacy-aware analytics Fishery (Norway): Virtual pilot
Genomics Agri: A2.1 Greenhouse , B1.3 Biomass sorghum
Forestry Data
management
Forestry (Finland): 2.2.1 Data Sharing , 2.2.2
Monitoring, 2.4.2 Shared Multiuser environment
Fisheries decision support
in catch planning
Fishery: B2, C1, C2, Virtual pilot
Source: Deliverable D4.4 Service documentation
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
14
Linked Data Integration and Publication Pipeline
Source: Raul Palma
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
15
Case: Link discovery from knowledge graphs
Source: Raul Palma
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
16
Case: Link discovery from knowledge graphs
We applied discovery of RDF spatial links
based on topology (Geo-L):
• Identifying fields from Czech LPIS data
with specific soil type, from Czech open
data
• Identifying all fields in a specific region
which grow the same type of crops like the
one grown in a specific field over a given
period of time
• Identifying ”risky” plots from Czech LPIS
data which intersect with buffer zones
around water bodies.
Figure: Risky overlap area between a crop field
and a buffer zone of a water
Source: Raul Palma
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
17
Managing project assets: components, pipelines,
datasets and reports
• DataBio Hub (DataBioHub.eu) is central in the development platform
• Provides a catalogue of public (and private) digital assets of DataBio
• Links resources together (project reports, models, docker modules etc)
• Describes currently 101 components, 65 datasets, 25 pipelines (7 generic), 27 pilots
• Provides links to Deliverables, Interfaces (SPARQL endpoints, REST…)
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
18
Conclusions
• DataBio designed a common development platform for 27 pilots in
agriculture, forestry and fishery
• Big Data pipelines were central in this platform as a tool for developers
• We showed that a few generic pipelines can be applied in numereous
diverse applications
• Instantations of the generic pipelines were used in the pilots
• All Dtabio results are available in the searchable and cross-linked
DataBio Hub.
This document is part of a project that has received funding
from the European Union’s Horizon 2020 research and innovation programme
under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or
reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu.
19
Thank you for your attention!
35
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Pipelines for Medical Imaging Use Cases &
Requirements for Benchmarking
Federico Bolelli, Jon Ander Gómez, Costantino Grana, Roberto Paredes
Evaluation schemes for Big data and AI Performance
of high Business impact
EBDVF 2020, November 3–5, 2020
Deep-Learning and HPC to Boost Biomedical Applications for Health
36
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
About DeepHealth
Aim & Goals
• Put HPC computing power at the service of biomedical applications with DL needs and apply DL
techniques on large and complex image biomedical datasets to support new and more efficient ways of
diagnosis, monitoring and treatment of diseases.
• Facilitate the daily work and increase the productivity of medical personnel and IT professionals in
terms of image processing and the use and training of predictive models without the need of combining
numerous tools.
• Offer a unified framework adapted to exploit underlying heterogeneous HPC and Cloud architectures
supporting state-of-the-art and next-generation Deep Learning (AI) and Computer Vision algorithms to
enhance European-based medical software platforms.
Duration: 36 months
Starting date: Jan 2019
Budget 14.642.366 €
EU funding 12.774.824 €
22 partners from 9 countries:
Research centers, Health organizations,
large industries and SMEs
Research Organisations Health Organisations Large Industries SMEs
Key facts
37
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Developments & Expected Results
• The DeepHealth toolkit
• Free and open-source software: 2 libraries + front-end.
• EDDLL: The European Distributed Deep Learning Library
• ECVL: the European Computer Vision Library
• Ready to run algorithms on Hybrid HPC + Cloud architectures with heterogeneous hardware (Distributed
versions of the training algorithms)
• Ready to be integrated into end-user software platforms or applications
• HPC infrastructure for an efficient execution of the training algorithms which are computationally
intensive by making use of heterogeneous hardware in a transparent way
• Seven enhanced biomedical and AI software platforms provided by EVERIS, PHILIPS, THALES, UNITO,
WINGS, CRS4 and CEA that integrate the DeepHealth libraries to improve their potential
• Proposal for a structure for anonymised and pseudonymised data lakes
• Validation in 14 use cases (neurological diseases, tumor detection and early cancer prediction, digital
pathology and automated image annotation).
EU libraries
38
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
The Concept
SOFTWARE
PLATFORM
BIOMEDICAL
APPLICATION
Medical image
(CT , MRI , X-Ray,…)
Health professionals
(Doctors & Medical staff)
ICT Expert users
Prediction
Score and/or highlighted
RoI returned to the
doctor
Data used to
train models
Platform uses
the toolkit to
train models
(optional)
Images labelled by doctors
Add validated
images
Production environment Training environment
Using loaded/trained
predictive models
Samples to be pre-processed and validated
Images pending to
be pre-processed
and validated
HPC infrastructure
TOOLKIT
39
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
GPUCPU
A pipeline is a set of operations sequentially applied to a data block (subset of samples)
Data
loading
from a
storage
resource
Sequence of
data
augmentation
transformations
Train batch
Pipeline Definition
What do we understand by pipeline in this context?
40
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Data & Model Pipelines
Dataset
design
Data
Acquisition
Data
Curation
Persistent
Data
Storage
Data
Partitioning
Data
Augmentation
AI/ML/DL
Model
training
AI/ML/DL
Model
evaluation
Solution
Deployment
Data Pipeline
AI/ML/DL Model Pipeline
Training Pipeline is at the core of the Model Pipeline which in turn is considered part of the Data
Pipeline
Both pipelines are suitable for business and research applications
The whole Data Pipeline is applicable to any sector. Our project is focused on the Health sector
Training Pipeline
41
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Dataset
design
Data
Acquisition
Data
Curation
Persistent
Data
Storage
Data
Partitioning
Data
Augmentation
AI/ML/DL
Model
training
AI/ML/DL
Model
evaluation
Solution
Deployment
• Data types and formats –identification and definition
• Metadata definition
• Data Lake structure definition
• Guidelines / HOW-TOs
• Etc.
Data Analytics/ML
(including data processing for analysis, AI
and Machine Learning)
Data
Visualisation
+
Action /
Interaction
(including data
presentation
environment /
boundary / user
action and
interaction)
Data
Acquisition /
Collection
(including Data
Ingestion
processing,
Streaming, Data
Extraction,
Ingestion Storage,
Different data
types)
Data
Storage /
Preparation
(including Storage
Retrieval / Access
/ Queries Data
Protection,
Curation,
Integration,
Publication)
42
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Dataset
design
Data
Acquisition
Data
Curation
Persistent
Data
Storage
Data
Partitioning
Data
Augmentation
AI/ML/DL
Model
training
AI/ML/DL
Model
evaluation
Solution
Deployment
Data Analytics/ML
(including data processing for analysis, AI
and Machine Learning)
Data
Visualisation
+
Action /
Interaction
(including data
presentation
environment /
boundary / user
action and
interaction)
Data
Acquisition /
Collection
(including Data
Ingestion
processing,
Streaming, Data
Extraction,
Ingestion Storage,
Different data
types)
Data
Storage /
Preparation
(including Storage
Retrieval / Access
/ Queries Data
Protection,
Curation,
Integration,
Publication)
• Data acquisition continuum
• Data cleansing/cleaning/wrangling/crunching
• Aggregated data/values computation
• Implementation of the data-lake definition
• Creation of users and permissions or make data public
43
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Dataset
design
Data
Acquisition
Data
Curation
Persistent
Data
Storage
Data
Partitioning
Data
Augmentation
AI/ML/DL
Model
training
AI/ML/DL
Model
evaluation
Solution
Deployment
Data Analytics/ML
(including data processing for analysis, AI
and Machine Learning)
Data
Visualisation
+
Action /
Interaction
(including data
presentation
environment /
boundary / user
action and
interaction)
Data
Acquisition /
Collection
(including Data
Ingestion
processing,
Streaming, Data
Extraction,
Ingestion Storage,
Different data
types)
Data
Storage /
Preparation
(including Storage
Retrieval / Access
/ Queries Data
Protection,
Curation,
Integration,
Publication)
Model training loop:
• Partitioning in training/validation/test subsets
• Data augmentation on-the-fly
• Training and evaluation of models
• Cloud & High Performance Computing requirements
44
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Skin Lesion Detection and Classification
• Use case nº 12 of the DeepHealth project is based on the International Skin
Imaging Collaboration dataset
• Aims: identification (segmentation) and diagnosis (classification) of skin
lesion images among different classes
Dataset
design
Data
Acquisition
Data
Curation
Persistent
Data
Storage
• Retrospective acquisition
• 23.906 annotated images
• Publicly available on the
ISIC archive website
• jpeg data format
Actinic Keratosis Dermatofibroma MelanomaNevusBenign Keratoses Basal Cell
Carcinoma
Data Partitioning
• Training 19.000
• Validation 906
• Test 4.000
Data
Augmentation
Model training
Model evaluation
45
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Data
Augmentation
AI/ML/DL
Model
training
AI/ML/DL
Model
evaluation
0
0.2
0.4
0.6
0.8
Skin Lesion Detection and Classification
Multiclass
Balanced
Accuracy
Jaccard Index
(Intersection over Union)
Performed using the DeepHealth toolkit. Models
are already available in the front-end.
Explainability plays a fundamental role in this context.
Ensuring Confidence Calibration and providing a Visual
Explanation of the models is essential to support
clinicians.
46
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Needs & Requirements
Evaluate datasets in terms of
1. Findability – where should a data scientist search for the dataset?
2. Availability – how long does a data scientist need to start the initial
exploratory data analysis?
3. Interoperability – how long does a data scientist need to start training
AI/ML/DL models with a dataset?
4. Reusability – are previously obtained results with a dataset public and
available to other researchers / data scientists?
5. Privacy / Anonymisation – can the dataset be made public without
compromising the identity of individuals?
6. Quality – is the dataset biased or unbalanced? What procedure has been
followed to validate annotations?
47
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Needs & Requirements
Evaluate Deep Learning libraries in terms of
1. Speed-up – is distributed learning really efficient?
2. Convergence – does the distributed learning reach the same model
accuracy in less time?
3. Usability – how long does a developer need to use the libraries
effectively?
4. Integrability – how difficult is it to integrate the libraries as part of
solutions to deploy?
5. KPIs: time-of-training-models (totm), performance/power/accuracy
trade-off, etc.
6. Others – can you help us to evaluate other aspects?
48
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Needs & Requirements
Evaluate Software Platforms in terms of
1. Usability – how long does a domain application expert need to manage the
software tool effectively?
2. Completeness – does the application platform provide all the
algorithms/procedures/functions to allow domain application experts to
easily define the sequences of steps to implement the data and/or model
pipelines?
3. Compatibility – how many data formats does the platform admits to
import/export data and models from/to other frameworks?
4. KPIs: time-to-model-in-production (ttmip), time-of-pre-processing-images
(toppi), etc.
5. Others – can you help us to evaluate other aspects?
49
The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.
Questions?
@DeepHealthEU
https://deephealth-project.eu
Federico Bolelli federico.bolelli@unimore.it
Jon Ander Gómez jon@prhlt.upv.es
Costantino Grana costantino.grana@unimore.it
Roberto Paredes rparedes@prhlt.upv.es
This project has received funding from the European Horizon 2020 Programme for
research, technological development and demonstration under grant agreement n°
780966
Contacts
info@databench.eu
@DataBench_eu
DataBench
www.databench.eu
DataBench Project
DataBench
DataBench Project

Contenu connexe

Tendances

MLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaMLUC 2011 XQuery Enigma
MLUC 2011 XQuery Enigma
Peter O'Kelly
 
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
DataWorks Summit
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
DataWorks Summit
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
DataWorks Summit
 

Tendances (20)

Virtual BenchLearning - Data Bench Framework
Virtual BenchLearning - Data Bench FrameworkVirtual BenchLearning - Data Bench Framework
Virtual BenchLearning - Data Bench Framework
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence Development
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
 
MLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaMLUC 2011 XQuery Enigma
MLUC 2011 XQuery Enigma
 
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
 
Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)
 
Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0 Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0
 
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
 
DataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open DataDataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open Data
 
Data as a service
Data as a serviceData as a service
Data as a service
 
Denodo DataFest 2016: Big Data Virtualization in the Cloud
Denodo DataFest 2016: Big Data Virtualization in the CloudDenodo DataFest 2016: Big Data Virtualization in the Cloud
Denodo DataFest 2016: Big Data Virtualization in the Cloud
 
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 

Similaire à Session 2 - A Project Perspective on Big Data Architectural Pipelines and Benchmarks

Similaire à Session 2 - A Project Perspective on Big Data Architectural Pipelines and Benchmarks (20)

A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Keith Prabhu - Big Data Cloud Computing
Keith Prabhu - Big Data Cloud ComputingKeith Prabhu - Big Data Cloud Computing
Keith Prabhu - Big Data Cloud Computing
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...
AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...
AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
GraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfGraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdf
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Data Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud WorldData Virtualization to Survive a Multi and Hybrid Cloud World
Data Virtualization to Survive a Multi and Hybrid Cloud World
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 
StreamCentral Technical Overview
StreamCentral Technical OverviewStreamCentral Technical Overview
StreamCentral Technical Overview
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?
 

Plus de DataBench

Plus de DataBench (20)

Welcome to DataBench
Welcome to DataBenchWelcome to DataBench
Welcome to DataBench
 
Session 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksSession 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data Benchmarks
 
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...Session 3 - The DataBench Framework: A compelling offering to measure the Imp...
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...
 
Session 4 - A practical journey on how to use the DataBench Toolbox
Session 4 - A practical journey on how to use the DataBench ToolboxSession 4 - A practical journey on how to use the DataBench Toolbox
Session 4 - A practical journey on how to use the DataBench Toolbox
 
CoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core OperationsCoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core Operations
 
DataBench Toolbox in a Nutshell
DataBench Toolbox in a NutshellDataBench Toolbox in a Nutshell
DataBench Toolbox in a Nutshell
 
Success Stories on Big Data & Analytics
Success Stories on Big Data & AnalyticsSuccess Stories on Big Data & Analytics
Success Stories on Big Data & Analytics
 
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...
 
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
 
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
 
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
 
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
 
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
 
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...
 
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
 
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...
 
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...Relating Big Data Business and Technical Performance Indicators, Barbara Pern...
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...
 
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
 
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
 

Dernier

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 

Dernier (20)

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 

Session 2 - A Project Perspective on Big Data Architectural Pipelines and Benchmarks

  • 1. EUROPEAN BIG DATA VALUE FORUM 2020 Session 2. A Project perspective on Big Data architectural pipelines and benchmarks
  • 2. EUROPEAN BIG DATA VALUE FORUM 2020 Panelists Leonidas Kallipolitis Aegis - I-BiDaaS Projects Brian Elvesæter Research scientist, SINTEF TheyBuyForYou Project Caj SÖDERGÅRD Professor, VTT Data-driven Solutions DataBio Project Athanasios Koumparos Senior Software Engineer, Vodafone Innovus Track&Know Project Arne Berre Chief Scientist, SINTEF Jon Ander Gómez Adrián Universitat Politecnica de Valencia DeepHealth Project
  • 3. Introduction to Architectural pipelines • I-BiDaaS • TBFY • Track&Know • DataBio • DeepHealth
  • 4. Data Analytics/ML (including data processing for analysis, AI and Machine Learning) Data Visualisation, Action/Interaction (including data presentation environment/boundary/user action and interaction) Data Acquisition/Collection (including Data Ingestion processing, Streaming, Data Extraction, Ingestion Storage, Different data types) Data Storage/Preparation (including Storage Retrieval/Access/Queries Data Protection, Curation, Integration, Publication) Pipeline steps Data types Batch, Real time, Interactive Processing types - Benchmarks - Tools/Components - Standards Trans-continuum (Edge to Cloud)
  • 5. IoT – Real time Data Pipeline
  • 8.
  • 9.
  • 10. Conclusion on Pipelines and related benchmarks, Arne J. Berre, SINTEF • I-BiDaaS • TBFY • Track&Know • DataBio • DeepHealth
  • 11. EUROPEAN BIG DATA VALUE FORUM 2020 Industrial-Driven Big Data as a Self- Service Solution Leonidas Kallipolitis, AEGIS
  • 12. EUROPEAN BIG DATA VALUE FORUM 2020EUROPEAN BIG DATA VALUE FORUM 2020 Identity Card
  • 13. EUROPEAN BIG DATA VALUE FORUM 2020 1. FOUNDATION FOR RESEARCH AND TECHNOLOGY HELLAS (FORTH) 2. BARCELONA SUPERCOMPUTING CENTER - CENTRO NACIONAL DE SUPERCOMPUTACION (BSC) 3. IBM ISRAEL - SCIENCE AND TECHNOLOGY LTD (IBM) 4. CENTRO RICERCHE FIAT SCPA (CRF) 5. SOFTWARE AG (SAG) 6. CAIXABANK, S.A (CAIXA) 7. THE UNIVERSITY OF MANCHESTER (UNIMAN) 8. ECOLE NATIONALE DES PONTS ET CHAUSSEES (ENPC) 9. ATOS SPAIN SA (ATOS) 10. AEGIS IT RESEARCH LTD (AEGIS) 11. INFORMATION TECHNOLOGY FOR MARKET LEADERSHIP (ITML) 12. UNIVERSITY OF NOVI SAD FACULTY OF SCIENCES SERBIA (UNSPMF) 13. TELEFONICA INVESTIGACION Y DESARROLLO SA (TID) Consortium
  • 14. EUROPEAN BIG DATA VALUE FORUM 2020 Key messages
  • 15. Cope with the rate of data asset growth I-BiDaaS pipeline • Expert mode Analyze your DataUsers • Import your data • Self-service mode Data • Fabricate Data • Stream & Batch Analytics Results • Visualize the results • Share models Do it yourself In a flexible manner Break data silos Safe environment Interact with Big Data technologies Increase speed of data analysis Intra- and inter- domain data-flow Benefits of using I-BiDaaS • Co-develop mode • Tokenize data • Expert: Upload your code • Self-service: Select an algorithm from the pool • Co-develop: custom end-to- end application
  • 16. • Expert mode Analyze your DataUsers • Import your data • Self-service mode Data • Fabricate Data • Stream & Batch Analytics • Expert: Upload your code • Self-service: Select an algorithm from the pool Results • Visualize the results • Share models Do it yourself In a flexible manner Break data silos Safe environment Interact with Big Data technologies Increase speed of data analysis Cope with the rate of data asset growth Intra- and inter- domain data-flow Benefits of using I-BiDaaS • Co-develop mode • Co-develop: custom end- to-end application • Tokenize dataFlexible solution I-BiDaaS pipeline
  • 17. • Expert mode Analyze your DataUsers • Import your data • Self-service mode Data • Fabricate Data • Stream & Batch Analytics • Expert: Upload your code • Self-service: Select an algorithm from the pool Results • Visualize the results • Share models Do it yourself In a flexible manner Break data silos Safe environment Interact with Big Data technologies Increase speed of data analysis Cope with the rate of data asset growth Intra- and inter- domain data-flow Benefits of using I-BiDaaS • Co-develop mode • Co-develop: custom end- to-end application • Tokenize data Data sharing & breaking silos I-BiDaaS pipeline
  • 18. • Expert mode Analyze your DataUsers • Import your data • Self-service mode Data • Fabricate Data • Stream & Batch Analytics Results • Visualize the results • Share models Do it yourself In a flexible manner Break data silos Safe environment Interact with Big Data technologies Increase speed of data analysis Cope with the rate of data asset growth Intra- and inter- domain data-flow Benefits of using I-BiDaaS • Co-develop mode • Tokenize data I-BiDaaS pipeline • Expert: Upload your code • Self-service: Select an algorithm from the pool • Co-develop: custom end-to- end application
  • 19. • Expert mode Analyze your DataUsers • Import your data • Self-service mode Data • Fabricate Data • Stream & Batch Analytics • Expert: Upload your code • Self-service: Select an algorithm from the pool Results • Visualize the results • Share models Do it yourself In a flexible manner Break data silos Safe environment Interact with Big Data technologies Increase speed of data analysis Cope with the rate of data asset growth Intra- and inter- domain data-flow Benefits of using I-BiDaaS • Co-develop mode • Co-develop: custom end-to- end application • Tokenize data I-BiDaaS pipeline
  • 20. Project setup Data Selection Data preparation Experiment execution Results visualization I-BiDaaS Experimental Workflow Data Analytics/ML (including data processing for analysis, AI and Machine Learning) Data Visualisation, Action/Interaction (including data presentation environment/boundary/user action and interaction) Data Acquisition/Collection (including Data Ingestion processing, Streaming, Data Extraction, Ingestion Storage, Different data types) Data Storage/Preparation (including Storage Retrieval/Access/Queries Data Protection, Curation, Integration, Publication) DataBench Pipeline
  • 21. Project setup Data Selection Data preparation Experiment execution Results visualization Example: Banking Experiments Workflow Batch & Streaming Data Identify data collected by the different sources (ATMs, online banking services, employees’ workstations, external providers’ activity, network devices, etc.), stored in CAIXA datapool Data Fabrication Data Integration Data Anonymization Data Encryption Clustering Graph Based Analysis Static charts for batch analysis results Real-time interactive graphics for analysis results of constantly incoming streaming data DataBench Pipeline
  • 22. EUROPEAN BIG DATA VALUE FORUM 2020EUROPEAN BIG DATA VALUE FORUM 2020 Medium to long term business decisions Data Fabrication Platform (IBM) Refined specifications for data fabrication GPU-accelerated Analytics (FORTH) Apama Complex Event Processing (SAG) Streaming Analytics Batch Processing Advanced ML (UNSPMF) COMPsProgramming Model(BSC) Query Partitioning Infrastructurelayer: Private cloud; Commodity cluster; GPUs Pre-defined Queries SQL-like interface Domain Language Programming API User Interface Resource managementand orchestration (ATOS) Advanced Visualis. Advanced IT services for Big Data processing tasks; Open source pool of ML algorithms Data ingestion Programming Interface / Sequential Programming (AEGIS+SAG) (AEGIS) COMPsRuntime (BSC) Distributed large scale layer Application layer UniversalMessaging(SAG) TestData Fabrication(IBM)Meta- data; Data descri- ption Hecuba tools (BSC) Short term decisions real time alerts Model structure improvements Learnedpatterns correlations The I- BiDaaS solution: Architecture & technologie
  • 23. EUROPEAN BIG DATA VALUE FORUM 2020EUROPEAN BIG DATA VALUE FORUM 2020 WP2 Data, user interface, visualizationMedium to long term business decisions Data Fabrication Platform (IBM) Refined specifications for data fabrication GPU-accelerated Analytics (FORTH) Apama Complex Event Processing (SAG) Streaming Analytics Batch Processing Advanced ML (UNSPMF) COMPs Programming Model (BSC) Query Partitioning Infrastructure layer: Private cloud; Commodity cluster; GPUs Pre-defined Queries SQL-like interface Domain Language Programming API User Interface Resource management and orchestration (ATOS) Advanced Visualis. Advanced IT services for Big Data processing tasks; Open source pool of ML algorithms Programming Interface / Sequential Programming (AEGIS+SAG) (AEGIS) COMPs Runtime (BSC) Distributed large scale layer Application layer UniversalMessaging(SAG) DataFabrication Platform(IBM) Meta- data; Data descri- ption Hecuba tools (BSC) Short term decisions real time alerts Model structure improvements Learned patterns correlations Technologies: • IBM TDF • SAG UM • AEGIS AVT http://ibidaas.eu/tools
  • 24. EUROPEAN BIG DATA VALUE FORUM 2020EUROPEAN BIG DATA VALUE FORUM 2020 WP3 Batch Analytics Technologies: • BSC COMPSs • BSC Hecuba • BSC Qbeast • Advanced ML (UNSPMF) http://ibidaas.eu/tools Medium to long term business decisions Data Fabrication Platform (IBM) Refined specifications for data fabrication GPU-accelerated Analytics (FORTH) Apama Complex Event Processing (SAG) Streaming Analytics Batch Processing Advanced ML (UNSPMF) COMPs Programming Model (BSC) Query Partitioning Infrastructure layer: Private cloud; Commodity cluster; GPUs Pre-defined Queries SQL-like interface Domain Language Programming API User Interface Resource management and orchestration (ATOS) Advanced Visualis. Advanced IT services for Big Data processing tasks; Open source pool of ML algorithms Programming Interface / Sequential Programming (AEGIS+SAG) (AEGIS) COMPs Runtime (BSC) Distributed large scale layer Application layer UniversalMessaging(SAG) DataFabrication Platform(IBM) Meta- data; Data descri- ption Hecuba tools (BSC) Short term decisions real time alerts Model structure improvements Learned patterns correlations
  • 25. Benchmarking: Technology level I-BiDaaS partner Technology name Platform role DataBench pipeline Current benchmarks FORTH GPU accelerator technology Data pre-processing, Streaming Analytics Step 3 Custom benchmark (throughput, latency) BSC COMPSs Sequential programming model for distributed architectures Step 3 Applications (Own use cases) BSC Hecuba Data management framework with easy interface Step 2 Applications (Own use cases) BSC Qbeast Multidimensional indexing and storage Step 2 TPC-H IBM Test Data Fabrication Synthetic test data fabrication Step 1 Several open source + commercial products (e.g., Grid tools of CA) / No known benchmarks yet SAG Apama Streaming Analytics Platform Streaming Analytics Step 3 Custom benchmark (throughput) SAG Universal Messaging Message Broker Step 1 Custom benchmark (throughput) AEGIS Advanced visualization and monitoring Visualization and interface Step 4 N/A UNSPMF Pool of ML algorithms in COMPSs/Python Batch analytics Step 3 Respective MPI implementation; Sklearn; HiBench ATOS Resource management and orchestration module Resource management Step 2 N/A
  • 26. Benchmarking: Business, data & analytics levelBusiness Objectives Data Sets Data Size Processing Type Type of Analysis Telecoms ⁻ improve and optimize current operations ⁻ Anonymized mobility data (structured) ⁻ Anonymized call center data (unstructured) TB ⁻ batch & streaming ⁻ predictive ⁻ descriptive / diagnostic Finance ⁻ improve decision making ⁻ improve efficiency of Big Data solutions ⁻ Tokenized online banking control data (structured) ⁻ Tokenized bank transfer data (structured) ⁻ Tokenized IP address data (structured) PB ⁻ batch ⁻ batch & streaming ⁻ descriptive / diagnostic Manufacturing ⁻ improve and optimise current operations ⁻ improve the quality of the process and product ⁻ Anonymized SCADA/MES data (structured) ⁻ Anonymized Aluminum Die- casting (structured) GB ⁻ batch ⁻ batch & streaming ⁻ predictive ⁻ diagnostic
  • 27. Benchmarking: Business level I-BiDaaS Partner Use Case Most relevant business KPIs TID Accurate location prediction with high traffic and visibility - Acquisition of insights on the dynamics of cellular sectors - Processing costs (cost reduction) - Customer satisfaction TID Optimization of placement of telecommunication equipment TID Quality of service in Call Centers CAIXA Enhanced control on online banking - Cost reduction - Data accessibility - Time efficiency - End-to-end execution time (from data request to data provision) CAIXA Advanced analysis of bank transfer payment in financial terminal CAIXA Analysis of relationships through IP addresses CRF Production process of aluminium die-casting - 3 Product quality levels (High, Medium, Low) - Overall Equipment Effectiveness (OEE), - Maintenance cost - Cost reduction CRF Maintenance and monitoring of production assets
  • 28. EUROPEAN BIG DATA VALUE FORUM 2020EUROPEAN BIG DATA VALUE FORUM 2020 Thank you! Questions?
  • 29. Enabling procurement data value chains for economic development, demand management, competitive markets and vendor intelligence Brian Elvesæter, SINTEF brian.elvesater@sintef.no 29© by the TBFY-consortium, 2020 https://theybuyforyou.eu
  • 30. TheyBuyForYou (TBFY) • Horizon 2020 Innovation Action • Grant agreement No 780247 • Duration: 36 months • Jan 2018 – Dec 2020 • Overall budget • € 3 274 440 • Developing Big Data tools for Public Procurement • data access • data analytics • data interaction • data visualization • 10 Partners from 5 countries 30© by the TBFY-consortium, 2020 Partner Country Organisation Type SINTEF AS NO RTO Municipality of Zaragoza ES Public sector CERVED IT Private Company OpenCorporates UK Private Company Josef Stefan Institute SI RTO Ministry of Public Sector Innovation SI Public Sector OESÍA Networks ES Private Company OpenOpps UK Private Company Universidad Politécnica de Madrid ES University King's College London UK University
  • 31. TBFY Knowledge Graph © by the TBFY-consortium, 2020 31 OpenOpps OpenCorporates tender data company data RDF Triple store mapping and linking Reconciliation service Ontology network 129 million triples • more than 1.35 million tenders • more than 1.58 million awards • around 100 thousand companies http://data.tbfy.eu
  • 32. TBFY Platform 32© by the TBFY-consortium, 2020
  • 33. Benchmark case #2: Query/visualization 33© by the TBFY-consortium, 2020 • SPARQL Query Performance • KG API (RESTful API wrapping pre- defined SPARQL queries) • Visualization • Benchmark questions? • Which triple store (RDF) database to choose? • Which cloud computing (resource) plan to choose? • https://aws.amazon.com/pricing/ • Database stability issues • JVM heap size • DataBench ToolBox • LDBC Semantic Publishing Benchmark (SPB) to benchmark RDF triple stores • GraphDB • Apache Jena Fuseki & TDB • …
  • 34. Questions? Brian Elvesæter, SINTEF brian.elvesater@sintef.no 34© by the TBFY-consortium, 2020 https://theybuyforyou.eu
  • 35. Vodafone Innovus Fleet Management solutions Vodafone Innovus European Big Data Value Forum, 2020.11.04 Athanasios Koumparos
  • 36. 2 Getting to know Vodafone Innovus World class capabilities build a world class eco-system Who we are a 100% Vodafone company located in Athens, Greece, est. 2004 Operating in 7 Vodafone OpCos What we do We design and implement innovative IoT solutions based on VF Group strategy & customer needs What we deliver We have built Vodafone Group products as well as the Global Device Management solution Our advantages ✓ World class Platform ✓ Global & local expertise ✓ Operations experience ✓ Agile way of working
  • 37. Enterprise Fleet | Overview 3 Markets deployed VF Greece & VF Albania >13k active assets >6k assets in cold chain logistics 11 years of experience providing an excellent solution since 2009 Innovation We participate in EU programs (Track & Know, Trustonomy) and collaborate with the Hellenic Organization of Intelligent Transport Systems to enhance the solution capabilities A fresh, award winning approach new platform with re-designed UI, advanced Driver Safety & enhanced capabilities Logistics Vertical Soho / SME / Corporate segments Developed by VF Innovus Refreshed platform, GDSP connected A well-established solution, applicable to customers with advanced needs for location tracking, sensor monitoring & in-depth reports / analytics !
  • 38. VFI IoT Platform Enterprise Fleet GSM Network Low Level Device – Platform Communication Online Data Processing Analytics – Offline Processing API, GUI
  • 39. Devices on vehicles Enterprise Fleet Device in vehicles • Devices are installed under the hood (or bonnet) • Vehicles vary in type, age and electronics Data • More than 3M data packets received daily by the platform • GPS data (coordinates, angle, speed, date)
  • 40. Challenges Enterprise Fleet++ Reception problems • Low signal strength due to installation (under the bonnet) • Urban areas (bridges, tall buildings etc.) • GPS signal reflections Vehicle • Poor quality of power supply • Vehicle aging – noisy electricals
  • 41. Issues caused by problems Enterprise Fleet++ • Incorrect display position on map (on top of buildings, fields, wrong road) • False positive alerts (e.g. speed violation) • Location gaps during route display
  • 42. Track & Know Enterprise Fleet++ • Big Data Analytics • ML based components • Scalable • Online Processing • Offline Processing • Versatile • Easy to integrate (Kafka Topics) Data Collection / Collection Live data streaming from devices (GPS/Sensors) Data Storage / Preparation Data Analytics / ML Data Visualization Action / Interaction Visual analytics End user GUI Cleansing, enrichment, storage in NoSQL stores Pattern recognition Location forecasting Clustering Mobility networks
  • 43. Integration VFI IoT Platform to Track & Know Enterprise Fleet++ Online Processing Flows: • Data cleansing • Data enrichment • Data pattern recognition • Future location prediction • Driver behavior analysis Offline Processing Flows • Individual mobility networks, predict next service period • Hot spot analysis • Trajectory matching • Visualization GSM Network Low Level Device – Platform Communication Online Processing Analytics – Offline Processing API, GUI Online Offline
  • 44. Data Flows T&K Platform Enterprise Fleet++
  • 45. Data cleansing tool - Map matching tool Enterprise Fleet++
  • 46. GPS post enhancements Enterprise Fleet++ • Vehicle moving in dense urban area • GPS reception not always good • Vehicle moving on a highway • Sampling rate at 30 seconds
  • 48. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 1 This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064 This project is part of BDV PPP DATABIO PIPELINES Prof. Dr. Caj Södergård, Technical Manager of DataBio VTT Technical Research Centre of Finland European Big Data Forum 2020 DataBench session 4.11.2020
  • 49. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 2 Data-driven Bioeconomy - DataBio DataBio aim: Develop big data tools for enhancing production of raw materials for food, energy and biomaterials industries • A EU-funded project with 48 partners • 27 bioeconomy pilots in 17 countries • Duration 2017-2020 Responsible and sustainable production of food, energy and biomaterials Better raw material utilisation from agriculture, forestry and fishery sources New business opportunities through market-ready big data technologies Motivation Population growth and urbanisation are increasing the demand for natural resources, which is putting a strain on the Earth's carrying capacity. We aim to develop new sustainable ways to use forest, farm and fishery resources and to communicate real- time information to decision-makers and producers. 2
  • 50. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 3 DataBio platform serves the 27 pilots Big Data Pipeline 8 Big Data Pipelines - generic - pilot-wise
  • 51. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 4 DataBio platform serves the 27 pilots 8 Big Data Pipelines - generic - pilot-wise
  • 52. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 5 DataBio platform serves the 27 pilots Big Data Pipelines - generic - pilot-wise 8
  • 53. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 6 Pipelines vs. Services Pipeline • A chain of real-time processing components: • Acqusition/Collection, • Preparation • Analytics • Visualisation, User interaction • Clear interfaces between components and to outside • A ”white box” showing internal wiring for developers Service • Provides usability to end users • No display of internal wirings of components • Accessed through API:s (web services, remote calls) • Activated remotely through database queries (”end points”) and executed in the cloud. • Represents a ”black box”.
  • 54. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 7 We used the BDVA Reference Model
  • 55. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 8 Platform development in DataBio in numbers • 62components from 28partners in two trial rounds • 1-6components per pilot (average 2) • 14 new user interfaces • 59 new APIs • +2,7 in Technology Readiness Level (1-9)
  • 56. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 9 Generic pipelines in DataBio • Generic pipeline Applied in pilots Earth Observation and Geospatial data processing A1.1 Precision agri B1.2 Precision agri , C1.1 Insurance, C2.2 CAP Support Fishery: A1 Tuna fishing operations , B1 Tuna fishing planning, IoT data real-time processing and decision making Agri: A1.1 Precision agri, B1.1 Cereals & biomass crops Fishery: A1 Linked Data Integration and publication Agri: B1.4 Precision agri B2.1 Machinery mgmt, Other open farm and geospatial datasets Fishery: Virtual pilot Privacy-aware analytics Fishery (Norway): Virtual pilot Genomics Agri: A2.1 Greenhouse , B1.3 Biomass sorghum Forestry Data management Forestry (Finland): 2.2.1 Data Sharing , 2.2.2 Monitoring, 2.4.2 Shared Multiuser environment Fisheries decision support in catch planning Fishery: B2, C1, C2, Virtual pilot Source: Deliverable D4.4 Service documentation
  • 57. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 10 Generic pipelines in DataBio • Generic pipeline Applied in pilots Earth Observation and Geospatial data processing A1.1 Precision agri B1.2 Precision agri , C1.1 Insurance, C2.2 CAP Support Fishery: A1 Tuna fishing operations , B1 Tuna fishing planning, IoT data real-time processing and decision making Agri: A1.1 Precision agri, B1.1 Cereals & biomass crops Fishery: A1 Linked Data Integration and publication Agri: B1.4 Precision agri B2.1 Machinery mgmt, Other open farm and geospatial datasets Fishery: Virtual pilot Privacy-aware analytics Fishery (Norway): Virtual pilot Genomics Agri: A2.1 Greenhouse , B1.3 Biomass sorghum Forestry Data management Forestry (Finland): 2.2.1 Data Sharing , 2.2.2 Monitoring, 2.4.2 Shared Multiuser environment Fisheries decision support in catch planning Fishery: B2, C1, C2, Virtual pilot Source: Deliverable D4.4 Service documentation
  • 58. Decision Making Visualization IoT devices Real-time Data Collection Complex Event Processing Real-time Data Preprocessing events complex events complex events input events events Data Analytics (including data processingforanalysis, AI andMachine Learning) Data Visualisation and UserInteraction (including Data presentationand user interaction) Data Acquisition/Collection (including DataIngestion processing,Storageand Retrieval/Access/Queries, Differentdata types) Data Preparation (including DataProtection, Cleaning, Fusion,Integration, Linking, Extraction,Curation, Publication/Registration/, Discovery) Instance: Prediction and real-time alerts of diseases and pests breakouts in crops
  • 59. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 12 Crop monitoring using ”IoT Generic Pipeline
  • 60. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 13 Generic pipelines in DataBio • Generic pipeline Applied in pilots Earth Observation and Geospatial data processing A1.1 Precision agri B1.2 Precision agri , C1.1 Insurance, C2.2 CAP Support Fishery: A1 Tuna fishing operations , B1 Tuna fishing planning, IoT data real-time processing and decision making Agri: A1.1 Precision agri, B1.1 Cereals & biomass crops Fishery: A1 Linked Data Integration and publication Agri: B1.4 Precision agri B2.1 Machinery mgmt, Other open farm and geospatial datasets Fishery: Virtual pilot Privacy-aware analytics Fishery (Norway): Virtual pilot Genomics Agri: A2.1 Greenhouse , B1.3 Biomass sorghum Forestry Data management Forestry (Finland): 2.2.1 Data Sharing , 2.2.2 Monitoring, 2.4.2 Shared Multiuser environment Fisheries decision support in catch planning Fishery: B2, C1, C2, Virtual pilot Source: Deliverable D4.4 Service documentation
  • 61. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 14 Linked Data Integration and Publication Pipeline Source: Raul Palma
  • 62. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 15 Case: Link discovery from knowledge graphs Source: Raul Palma
  • 63. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 16 Case: Link discovery from knowledge graphs We applied discovery of RDF spatial links based on topology (Geo-L): • Identifying fields from Czech LPIS data with specific soil type, from Czech open data • Identifying all fields in a specific region which grow the same type of crops like the one grown in a specific field over a given period of time • Identifying ”risky” plots from Czech LPIS data which intersect with buffer zones around water bodies. Figure: Risky overlap area between a crop field and a buffer zone of a water Source: Raul Palma
  • 64. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 17 Managing project assets: components, pipelines, datasets and reports • DataBio Hub (DataBioHub.eu) is central in the development platform • Provides a catalogue of public (and private) digital assets of DataBio • Links resources together (project reports, models, docker modules etc) • Describes currently 101 components, 65 datasets, 25 pipelines (7 generic), 27 pilots • Provides links to Deliverables, Interfaces (SPARQL endpoints, REST…)
  • 65. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 18 Conclusions • DataBio designed a common development platform for 27 pilots in agriculture, forestry and fishery • Big Data pipelines were central in this platform as a tool for developers • We showed that a few generic pipelines can be applied in numereous diverse applications • Instantations of the generic pipelines were used in the pilots • All Dtabio results are available in the searchable and cross-linked DataBio Hub.
  • 66. This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Find us at www.databio.eu. 19 Thank you for your attention!
  • 67. 35 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Pipelines for Medical Imaging Use Cases & Requirements for Benchmarking Federico Bolelli, Jon Ander Gómez, Costantino Grana, Roberto Paredes Evaluation schemes for Big data and AI Performance of high Business impact EBDVF 2020, November 3–5, 2020 Deep-Learning and HPC to Boost Biomedical Applications for Health
  • 68. 36 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. About DeepHealth Aim & Goals • Put HPC computing power at the service of biomedical applications with DL needs and apply DL techniques on large and complex image biomedical datasets to support new and more efficient ways of diagnosis, monitoring and treatment of diseases. • Facilitate the daily work and increase the productivity of medical personnel and IT professionals in terms of image processing and the use and training of predictive models without the need of combining numerous tools. • Offer a unified framework adapted to exploit underlying heterogeneous HPC and Cloud architectures supporting state-of-the-art and next-generation Deep Learning (AI) and Computer Vision algorithms to enhance European-based medical software platforms. Duration: 36 months Starting date: Jan 2019 Budget 14.642.366 € EU funding 12.774.824 € 22 partners from 9 countries: Research centers, Health organizations, large industries and SMEs Research Organisations Health Organisations Large Industries SMEs Key facts
  • 69. 37 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Developments & Expected Results • The DeepHealth toolkit • Free and open-source software: 2 libraries + front-end. • EDDLL: The European Distributed Deep Learning Library • ECVL: the European Computer Vision Library • Ready to run algorithms on Hybrid HPC + Cloud architectures with heterogeneous hardware (Distributed versions of the training algorithms) • Ready to be integrated into end-user software platforms or applications • HPC infrastructure for an efficient execution of the training algorithms which are computationally intensive by making use of heterogeneous hardware in a transparent way • Seven enhanced biomedical and AI software platforms provided by EVERIS, PHILIPS, THALES, UNITO, WINGS, CRS4 and CEA that integrate the DeepHealth libraries to improve their potential • Proposal for a structure for anonymised and pseudonymised data lakes • Validation in 14 use cases (neurological diseases, tumor detection and early cancer prediction, digital pathology and automated image annotation). EU libraries
  • 70. 38 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. The Concept SOFTWARE PLATFORM BIOMEDICAL APPLICATION Medical image (CT , MRI , X-Ray,…) Health professionals (Doctors & Medical staff) ICT Expert users Prediction Score and/or highlighted RoI returned to the doctor Data used to train models Platform uses the toolkit to train models (optional) Images labelled by doctors Add validated images Production environment Training environment Using loaded/trained predictive models Samples to be pre-processed and validated Images pending to be pre-processed and validated HPC infrastructure TOOLKIT
  • 71. 39 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. GPUCPU A pipeline is a set of operations sequentially applied to a data block (subset of samples) Data loading from a storage resource Sequence of data augmentation transformations Train batch Pipeline Definition What do we understand by pipeline in this context?
  • 72. 40 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Data & Model Pipelines Dataset design Data Acquisition Data Curation Persistent Data Storage Data Partitioning Data Augmentation AI/ML/DL Model training AI/ML/DL Model evaluation Solution Deployment Data Pipeline AI/ML/DL Model Pipeline Training Pipeline is at the core of the Model Pipeline which in turn is considered part of the Data Pipeline Both pipelines are suitable for business and research applications The whole Data Pipeline is applicable to any sector. Our project is focused on the Health sector Training Pipeline
  • 73. 41 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Dataset design Data Acquisition Data Curation Persistent Data Storage Data Partitioning Data Augmentation AI/ML/DL Model training AI/ML/DL Model evaluation Solution Deployment • Data types and formats –identification and definition • Metadata definition • Data Lake structure definition • Guidelines / HOW-TOs • Etc. Data Analytics/ML (including data processing for analysis, AI and Machine Learning) Data Visualisation + Action / Interaction (including data presentation environment / boundary / user action and interaction) Data Acquisition / Collection (including Data Ingestion processing, Streaming, Data Extraction, Ingestion Storage, Different data types) Data Storage / Preparation (including Storage Retrieval / Access / Queries Data Protection, Curation, Integration, Publication)
  • 74. 42 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Dataset design Data Acquisition Data Curation Persistent Data Storage Data Partitioning Data Augmentation AI/ML/DL Model training AI/ML/DL Model evaluation Solution Deployment Data Analytics/ML (including data processing for analysis, AI and Machine Learning) Data Visualisation + Action / Interaction (including data presentation environment / boundary / user action and interaction) Data Acquisition / Collection (including Data Ingestion processing, Streaming, Data Extraction, Ingestion Storage, Different data types) Data Storage / Preparation (including Storage Retrieval / Access / Queries Data Protection, Curation, Integration, Publication) • Data acquisition continuum • Data cleansing/cleaning/wrangling/crunching • Aggregated data/values computation • Implementation of the data-lake definition • Creation of users and permissions or make data public
  • 75. 43 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Dataset design Data Acquisition Data Curation Persistent Data Storage Data Partitioning Data Augmentation AI/ML/DL Model training AI/ML/DL Model evaluation Solution Deployment Data Analytics/ML (including data processing for analysis, AI and Machine Learning) Data Visualisation + Action / Interaction (including data presentation environment / boundary / user action and interaction) Data Acquisition / Collection (including Data Ingestion processing, Streaming, Data Extraction, Ingestion Storage, Different data types) Data Storage / Preparation (including Storage Retrieval / Access / Queries Data Protection, Curation, Integration, Publication) Model training loop: • Partitioning in training/validation/test subsets • Data augmentation on-the-fly • Training and evaluation of models • Cloud & High Performance Computing requirements
  • 76. 44 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Skin Lesion Detection and Classification • Use case nº 12 of the DeepHealth project is based on the International Skin Imaging Collaboration dataset • Aims: identification (segmentation) and diagnosis (classification) of skin lesion images among different classes Dataset design Data Acquisition Data Curation Persistent Data Storage • Retrospective acquisition • 23.906 annotated images • Publicly available on the ISIC archive website • jpeg data format Actinic Keratosis Dermatofibroma MelanomaNevusBenign Keratoses Basal Cell Carcinoma Data Partitioning • Training 19.000 • Validation 906 • Test 4.000 Data Augmentation Model training Model evaluation
  • 77. 45 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Data Augmentation AI/ML/DL Model training AI/ML/DL Model evaluation 0 0.2 0.4 0.6 0.8 Skin Lesion Detection and Classification Multiclass Balanced Accuracy Jaccard Index (Intersection over Union) Performed using the DeepHealth toolkit. Models are already available in the front-end. Explainability plays a fundamental role in this context. Ensuring Confidence Calibration and providing a Visual Explanation of the models is essential to support clinicians.
  • 78. 46 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Needs & Requirements Evaluate datasets in terms of 1. Findability – where should a data scientist search for the dataset? 2. Availability – how long does a data scientist need to start the initial exploratory data analysis? 3. Interoperability – how long does a data scientist need to start training AI/ML/DL models with a dataset? 4. Reusability – are previously obtained results with a dataset public and available to other researchers / data scientists? 5. Privacy / Anonymisation – can the dataset be made public without compromising the identity of individuals? 6. Quality – is the dataset biased or unbalanced? What procedure has been followed to validate annotations?
  • 79. 47 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Needs & Requirements Evaluate Deep Learning libraries in terms of 1. Speed-up – is distributed learning really efficient? 2. Convergence – does the distributed learning reach the same model accuracy in less time? 3. Usability – how long does a developer need to use the libraries effectively? 4. Integrability – how difficult is it to integrate the libraries as part of solutions to deploy? 5. KPIs: time-of-training-models (totm), performance/power/accuracy trade-off, etc. 6. Others – can you help us to evaluate other aspects?
  • 80. 48 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Needs & Requirements Evaluate Software Platforms in terms of 1. Usability – how long does a domain application expert need to manage the software tool effectively? 2. Completeness – does the application platform provide all the algorithms/procedures/functions to allow domain application experts to easily define the sequences of steps to implement the data and/or model pipelines? 3. Compatibility – how many data formats does the platform admits to import/export data and models from/to other frameworks? 4. KPIs: time-to-model-in-production (ttmip), time-of-pre-processing-images (toppi), etc. 5. Others – can you help us to evaluate other aspects?
  • 81. 49 The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111.The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111. Questions? @DeepHealthEU https://deephealth-project.eu Federico Bolelli federico.bolelli@unimore.it Jon Ander Gómez jon@prhlt.upv.es Costantino Grana costantino.grana@unimore.it Roberto Paredes rparedes@prhlt.upv.es
  • 82. This project has received funding from the European Horizon 2020 Programme for research, technological development and demonstration under grant agreement n° 780966 Contacts info@databench.eu @DataBench_eu DataBench www.databench.eu DataBench Project DataBench DataBench Project