SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
TOWARDS A BIG DATA ROADMAP
FOR EUROPE
Martin Strohbach, AGT International
Tilman Becker, Edward Curry, John Domnique, Ricard
Munné, Sebnem Rusitschka, Sonja Zillner
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
2
OVERVIEW
• Big Data Public Private Forum and Partnership
• 3 Data Sharing Use Cases
• Methodology for creating a community based roadmap
• Findings from the Use Cases, Sectors and Data Value Chain Analysis
• Impact of the findings towards a European roadmap
• Conclusions
2nd JTC 1 SGBD Meeting - Workshop
THE EU PROJECT BIG
BIG DATA PUBLIC PRIVATE FORUM
Europe needs a clear strategy for leveraging Big Data
Economy in Europe
Work at technical, business and policy levels, shaping the
future through the positioning of Big Data in Horizon 2020.
Bringing the necessary stakeholders into a sustainable
industry-led initiative, which will greatly contribute to
enhance the EU competitiveness taking full advantage of
Big Data technologies.
Objectives
Trigger
Type of project: Coordination & Support Action
Project start date: September 2012
Duration: 26 months
Call: FP7-ICT-2011-8
Budget: 3,038 M€
Consortium: 11 partners
Facts
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
4
BIG DATA PRIVATE PUBLIC PARTNERSHIP
• Objectives: „The goal […] is to increase the amount
of productive European economic activities and the
number of European jobs that depend on the
availability of high quality data assets and the
technologies needed to derive value from them.”
(Source. Strategic Research and Innovation Agenda, SRIA)
• Neelie Kroes, EU Commissioner for the Digital
Agenda: „This is a revolution and I want the EU
to be right at the front of it“
• 29 European organisations from
industry and research
• Results of public consultattion to
be presented at NESSI Summit
Brussels, 27th May
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
SELECTED USE CASES
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
6
OVERCOMING THE DATA SILO CULTURE IN
PUBLIC SECTOR
ID data
Tax data
Civil
Registry
Home Office
Tax Agency
Civil Registry
Education
Ministry
Local
Administration
Data
aggregation
process
Citizen
Scholarship
Registry
2nd JTC 1 SGBD Meeting - Workshop
USE CASE HEALTHCARE
SECONDARY USAGE OF HEALTH DATA
▶ Secondary usage of health data is defined as the
aggregation, analysis and concise presentation of
clinical, financial, administrative as well as other
related health data
▶ in order to discover new valuable knowledge, for
instance to identify trends, predict outcomes or
influence patient care, drug development, or
therapy choices.
Description
Example Use Cases
▶ Example 1: Patient recruiting and profiling suitable
for conducting clinical studies. Today, often
clinical studies, in particular studies investigating
rare diseases, fail due to the fact that not enough
patients are available for conducting clinical
studies (Berliner Forschungsplattform Gesundheit
(BFG), Astra Zeneca)
▶ Example 2: Social Media Based Medication
Intelligence Mining of health data on the discover
insights about side effects of medications (e.g.
Treato)
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
8
PEER ENERGY CLOUD
Smart grid pilot in Saarlouis
100 households
Berlin
SaarlouisInnovation award
Engage consumers to optimally
use local solar energy
 Understand consumption and
save
 Trade solar energy in the
neighborhood to balance
the grid
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
9
DEVICE LEVEL ENERGY MONITORING
Monitored/controlled grid today
Monitored/controlled grid tomorrow
Germany aims at 30%
clean/renewable energy by
2020, seeking to build a smart
grid
Sensors
today
Sensors
tomorrow
(consumer
level)
Energy
Consumption
Temperature
Movement,...
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
10
GETTING READY FOR DATA VOLUMES IN
FUTURE GRIDS
PeerEnergyCloud Pilots allows us to get ready for future data
volumes today
How much data is really needed
for what?
1 value per
year
35.040 values
per year
today smart
metering
540 million
values per year
PeerEnergy-
Cloud
? Billion values
per year
Future
possibilities
Optimum?
7 devices per
household every
2 seconds , 4-5
measurements
per devices
every 15
minutes
real-time analytics
on mass data (grouped
aggregation)
Scalable statistics
over hundreds of millions
of measurements
Automatic detection
of load anomalies
(spotting inefficiencies
and defects)
Household activity
state inference and
prediction
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
BIG METHODOLOGY
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
12
SECTORIAL FORUMS AND TECHNICAL
WORKING GROUPS
Health Public Sector
Finance &
Insurance
Telco, Media&
Entertainment
Manufacturing,
Retail, Energy,
Transport
Needs Offerings
Big Data Value Chain
Technical Working Groups
Industry Driven Sectorial Forums
Data
Acquisition
Data
Analysis
Data
Curation
Data
Storage
Data
Usage
• Structured data
• Unstructured data
• Event processing
• Sensor networks
• Protocols
• Real-time
• Data streams
• Multimodality
• Structured data
• Unstructured data
• Event processing
• Sensor networks
• Protocols
• Real-time
• Data streams
• Multimodality
• Stream mining
• Semantic analysis
• Machine learning
• Information
extraction
• Linked Data
• Data discovery
• ‘Whole world’
semantics
• Ecosystems
• Community data
analysis
• Cross-sectorial data
analysis
• Stream mining
• Semantic analysis
• Machine learning
• Information
extraction
• Linked Data
• Data discovery
• ‘Whole world’
semantics
• Ecosystems
• Community data
analysis
• Cross-sectorial data
analysis
• Data Quality
• Trust / Provenance
• Annotation
• Data validation
• Human-Data
Interaction
• Top-down/Bottom-up
• Community / Crowd
• Human Computation
• Curation at scale
• Incentivisation
• Automation
• Interoperability
• Data Quality
• Trust / Provenance
• Annotation
• Data validation
• Human-Data
Interaction
• Top-down/Bottom-up
• Community / Crowd
• Human Computation
• Curation at scale
• Incentivisation
• Automation
• Interoperability
• In-Memory DBs
• NoSQL DBs
• NewSQL DBs
• Cloud storage
• Query Interfaces
• Scalability and
Performance
• Data Models
• Consistency,
Availability, Partition-
tolerance
• Security and Privacy
• Standardization
• In-Memory DBs
• NoSQL DBs
• NewSQL DBs
• Cloud storage
• Query Interfaces
• Scalability and
Performance
• Data Models
• Consistency,
Availability, Partition-
tolerance
• Security and Privacy
• Standardization
• Decision support
• Prediction
• In-use analytics
• Simulation
• Exploration
• Visualisation
• Modeling
• Control
• Domain-specific
usage
• Decision support
• Prediction
• In-use analytics
• Simulation
• Exploration
• Visualisation
• Modeling
• Control
• Domain-specific
usage
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
13
SECTOR ANALYSIS METHODOLOGY
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
14
TECHNICAL WORKGROUP APPROACH
Senior	
Academic
Senior	
Management
Middle	
Researcher
Middle	
Management
Position	in	Organisation
University
MNC
SME
Other
Types	of	Organisations
1. Literature & Technical Survey
2. Subject Matter Expert Interviews
3. Stakeholder Workshops
4. Online Questionnaire (with
NESSI)
• Early adopters
• Business enablement
• Technical maturity
• Key Opinion Leaders
Methodology
Interviewee Breakdown
Target Interviewee
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
TWG FINDINGS
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
16
KEY INSIGHTS
Key Trends
▶ Lower usability barrier for data tools
▶ Blended human and algorithmic data processing for coping with
for data quality
▶ Leveraging large communities (crowds)
▶ Need for semantic standardized data representation
▶ Significant increase in use of new data models (i.e. graph)
(expressivity and flexibility)
▶ Much of (Big Data) technology
is evolving evolutionary
▶ But business processes change
must be revolutionary
▶ Data variety and verifiability
are key opportunities
▶ Long tail of data variety is a
major shift in the data landscape
The Data Landscape
▶ Lack of Business-driven Big Data
strategies
▶ Need for format and data storage
technology standards
▶ Data exchange between
companies, institutions, individuals,
etc.
▶ Regulations & markets for data
access
▶ Human resources: Lack of skilled
data scientists and data
engineers
Biggest Blockers
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
17
DATA VALUE CHAIN - ANALYSIS
Key Insights
 Old technologies applied in a new context
(Volume, Variety, Velocity)
 Need for
 Stream data mining
 ‘Good’ data discovery
 Techniques to deal with dealing with both
very broad and very specific data
 Features to Increase Take-up
 Simplicity including the ‘democratisation of
semantic technologies’
 Ecosystems of tools
 Communities and Big Data will be involved
in new and interesting relationships
 In collection, improving data accuracy,
analysis and usage
 Improves community engagement
 Cross-sectorial uses of Big Data will open
up new business opportunities
Data
Acquisition
Data
Analysis
Data
Curation
Data
Storage
Data
Usage
Social & Economic Impacts
• Cross-sectorial businesses
 Data-cleaning today requires a lot of work –
area for new business
 eHealth transformed from push (patient
visits doctor) to pull (continuous
monitoring)
 Also large scale trend analysis
 Conduit between research and individual
patient care
 E.g. creation of patient avatars based
on combination of specific data and
generic research data
 Proactive fraud detection even before it
happens
 Engaged citizens create and have access to
all data related to local, regional and
national social and policy issues
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
18
DATA VALUE CHAIN - CURATION
State of the Art
Data
Acquisition
Data
Analysis
Data
Curation
Data
Storage
Data
Usage
Use Case
 Master Data Management (MDM)
 Centralized, single point of reference for
data representation
 Collaboration platforms & Web 2.0 tools
 Wikis, Content Management Systems (CMS)
 Early-stage crowdsourcing services
 E.g. Amazon Mechanical Turk
 Health and Life Sciences
 ChemSpider: Collaborative platform for data
curation of chemical structures
 Protein Data Bank: Data curation platform
for 3D protein structures
 FoldIt: Crowdsourcing game platform for
protein folding
 Telco, Media, Entertainment
 Press Association, Thomson Reuters, The
New York Times
 Use of semantic technologies to structure
and categorize unstructured data (texts,
images and videos), improving content
accessibility and reuse
 Retail
 Ebay: use of crowdsourcing services to
improve product categorization
 Unilever: use of crowdsourcing services for
product feedback and sentiment analysis
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
19
DATA VALUE CHAIN - CURATION
Future Requirements
Data
Acquisition
Data
Analysis
Data
Curation
Data
Storage
Data
Usage
Emerging Trends
 Creation of incentives mechanisms for the
maintenance and publication of curated
datasets
 Definition of models for the data economy
 Understanding of social engagement
mechanisms
 Reduction of the cost associated with the data
curation task (scalability)
 Improvement of the Human-Data interaction
aspects. Enabling domain experts and casual
users to query, explore, transform and curate
data
 Inclusion of trustworthiness mechanisms in
data curation
 Integration and interoperability between data
curation platforms / Standardization
 Investigation of theoretical and domain specific
models for data curation
 Better integration between unstructured and
structured data and tools
Incentives and social engagement
 Better recognition of the data curation role
 Understanding of social engagement mechanisms
Economic Models
 Pre-competitive and public-private partnerships
Curation at Scale
 Evolution human computation and crowdsourcing
 Instrumenting popular apps for data curation
 General-purpose data curation pipelines
 Human-data interaction
Trust
 Capture of data curation decisions & provenance
management
 Fine-grained permission management models
and tools
Standardization & interoperability
 Standardized data model and vocabularies
 Better integration between data curation tools
Data Curation Models
 Minimum information models
 Nanopulications
 Theoretical principles and domain-specific model
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
20
DATA VALUE CHAIN - STORAGE
Key Insights
From state-of-the art
Capability to store and manage virtually
unbounded volumes of data has huge potential
to transform society and business
Better Scalability at Lower Operational Complexity
and Costs
Big Data Storage Has Become a Commodity
Business.
Maturity Has Reached Enterprise-Grade Level
Open Challenges
Unclear Adoption Paths for Non-IT Based Sectors
Lack of standards and best practices is major
barrier for adoption
Open Scalability Challenges
Privacy and Security is Lacking Behind
Data
Acquisition
Data
Analysis
Data
Curation
Data
Storage
Data
Usage
Social & Economic
Impacts
 Enables a truly data-driven economy that is
able to manage a variety of data sets in an
integrated way
 Privacy and Security becomes more important
 Energy: high resolution smart meter data
contributes to the stable integration of renewable
energies
 Health Care: storage is key enabling technology
to solve integration challenges
 Media: enables using the voice of the crowd
 Transport: facilitates personalized and multi-
modal on large scale
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
21
DATA VALUE CHAIN - STORAGE
Future Requirements
Data
Acquisition
Data
Analysis
Data
Curation
Data
Storage
Data
Usage
Emerging Trends
 Standardization of Query Interfaces
 Legal Frameworks for data management
 Data Tracing and provenance in order to assess
trust in data and ensure compliance
 Better support and scalability for storing
semantic data models, e.g. in health care
 Graph databases will become more important
 Columnar stores experience wider adoptions as
they are often faster in most practical
applications
 Convergence with analytical frameworks
Analytical databases for better performance and
lower development complexity (Mahout, Spark,
Hadoop/R, rasdaman, SciDB)
 Data Hubs and Markets: Hadoop-based
solutions tend to become a central integration
point for all enterprise, sectorial and cross-
sectorial data
 Smart Data: only use relevant data in network
and at the edge processessing
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
22
DATA VALUE CHAIN - USAGE
Key Insights
Data
Acquisition
Data
Analysis
Data
Curation
Data
Storage
Data
Usage
Social & Economic Impacts
 Key task of Data Usage is to support business
decisions
 There is great variety of Big Data Applications.
Some key areas are:
 Industry 4.0 (industrial internet)
 Predictive maintenance
 Smart data and service integration
 Interactive exploration: Big Data generates
insights beyond existing models, new analysis
interfaces must support browsing and modeling
(visual analytics)
 Opportunities for data markets: integration with
customer and supplier data, services from
infrastructure (Iaas) to software (SaaS) to
business processes (BPaaS) to knowledge (KaaS)
 Regulatory issues:
 Data protection and privacy
 Ownership of derived data
 Integration of business processes and data
 Beyond single company; data market places
 Transparency through Data Usage impacts:
 Economy
 Society
 Privacy
 Current drivers are big companies
 Opportunities for SMEs though services
(integration) and data market places
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
USE CASE ANALYSIS
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
24
PUBLIC SECTOR
• What the Public sector is facing…
– Internal data silos → no integration of systems across public bodies
– Raising pressure on productivity → lacks productivity compared to the private sector
– Older workforce → compared to the private sector, relies on a far older workforce
– Aging population → increasing demand for medical and social services
• …and in what can Big data help:
– Improving many areas in public sector services:
 Strengthen collaboration among PS bodies
 Social welfare
 Citizen services
 Improve healthcare
 Government transparency and accountability
 Public safety
– Open Data initiatives → act as catalysts in the development of a data ecosystem through
the opening of their own datasets, and actively managing their dissemination and use
– Analysis of sensor data: Real time data processing to search for patterns and
relationships and present real-time views on Smart Cities for better urban management
and citizen engagement
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
25
Data Privacy and
Security
Legal framework supporting data
access and usage of citizen’s
information
PUBLIC SECTOR REQUIREMENTS
OVERCOMING THE DATA SILO CULTURE
Data ownership
Fragmentation of data ownership
that leads to the data silo and
interoperability issues
Data Sharing
Overcome data silos
because of legacy
technologies
Common strategy at
all levels of
administration
So much energy is lost and will
remain so until a common
strategy is realized for the reuse
of cross technology platforms
Not-Technology-related Regulation & Technology Technology-related
Legislative and
political willingness
To promote legislation that allows
the reuse of data for other
purposes than those for what it
was originally collected
Openness of Data
Leadership to promote common
standards for Government Open
Data: APIs, format and schemas,
as well as for open data licensing
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
26
Data Security
Legal processes for data sharing
& communication are needed
Data Quality
Reliable insights for health-related
decisions require high data quality
HEALTHCARE REQUIREMENTS
High Investment
Long-term investments require
conjoint engagement of several
partners
Data Digitalization
only small percentage of data is
documented (lack of time) with
low quality
Semantic Annotation
transform unstructured data into
structured format
Data Sharing
Overcome data silos
and inflexible
interfaces
Business Cases
Undiscovered und unclaimed
potential business values
Value-based system
incentives
Current incentives enforce “high
number” instead of “high
quality” of care services
Not-Technology-related Regulation & Technology Technology-related
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
27
DATA POOLS IN HEALTHCARE
MAIN IMPACT BY INTEGRATING VARIOUS AND
HETEROGENEOUS DATA SOURCES
Clinical Data
 Owned by providers (such as
hospitals, care centers, physicians,
etc.)
 Encompass any information stored
within the classical hospital
information systems or EHR, such as
medical records, medical images, lab
results, genetic data, etc.
Claims, Cost &
Administrative Data
 Owned by providers and payors
 Encompass any data sets relevant for
reimbursement issues, such as
utilization of care, cost estimates,
claims, etc.
Pharmaceutical &
R&D Data
 Owned by the pharmaceutical
companies, research
labs/academia, government
 Encompass clinical trials,
clinical studies, population and
disease data, etc.
Patient Behaviour &
Sentiment Data
 Owned by consumers
or monitoring device
producer
 Encompass any
information related to
the patient behaviours
and preferences
Health data on the
web
 Mainly open source
 Examples are
websites such as
PatientLikeMe,
Linked Open Data,
etc.
Highest Impact
on integrated data sets
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
28
ENERGY SECTOR REQUISITES
STAKEHOLDERS, USE CASES, DATA SOURCES
TSO1 TSO2
DSO11
DSO12
DSO21
price signals
renewables
parks
prosumer
weather
e-car roaming
sustainable connected cities
industrial DSM
New business requires the combined value creation on mass data coming from a
variety of data sources, once the volume and velocity of energy data is mastered
Flexible energy tariffs for prosumers and eCars
• Actual energy usage instead standardized profiles
Efficient portfolio management
• detailed data on market prices,
• actual energy usage, feed-in
• weather conditions
Increased situational awareness
• intermittent, decentralized supply
• new responsive demand
• network asset and weather conditions
TSO – Transmission System Operator
DSO – Distribution System Operator
NETWORK OPERATORS RETAIL/WHOLESALE
END USER ENERGY MANAGEMENT
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
29
A digitally united
European Union
European stakeholders require reliable
minimally consistent rules and
regulation regarding digital rights and
regulations, whilst making use of all
technological enhancements
Data Access
Dynamic configurability of data
access of which data can be
collected for what purpose in
what granularity and time span
and location
Privacy and confidentiality
preserving data analytics are
required to enable the service
provider to retrieve the
knowledge without violating the
agreed upon granularity
ENERGY SECTOR REQUIREMENTS
Investment in
communication and
connectedness
Broadband communication or ICT in
general, needs to be widely available
alongside energy and transportation
infrastructure such that real-time data
access is given
Abstraction
from the actual big data
infrastructure is required to enable
(a) ease of use and (b) extensibility
and flexibility
Adaptive models of data
and system model
new knowledge extracted from
domain analytics or the ever changing
circumstances of the system can be
redeployed into the data analytics
framework
Data interpretability &
analytics
Expert and domain know-how must
be blended into the data
management and analytics. Data
analytics is required as part of every
step from data acquisition to data
management to data usage. Fast and
even real-time analytics is required to
support decisions, which need to be
made in ever shorter time spans
Skilled people
programming, statistical tools for
example needs to be part of engineering
education until big data technology
becomes more business user-friendly or
in case this becomes the “new normal”
Standardization &
Open Data
Open data is a great opportunity,
but, standardization is required.
Regarding data models,
representation, as well as protocols
practical migration paths
Not-Technology-related Regulation & Technology Technology-related
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
CONCLUSIONS
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
31
(SOME) COMMON REQUIREMENTS
Data Privacy and
Security
Legal frameowrks for data
sharing & communication are
needed
High Investment
Long-term investments require
conjoint engagement of several
partners
Not-Technology-related
Data Digitalization
only small percentage of data is
documented (lack of time) with
low quality
Semantic Annotation
transform unstructured data into
structured format
Data Sharing
Overcome data silos
and inflexible
interfaces
Business Cases
Undiscovered und unclaimed
potential business values
Regulation & Technology Technology-related
Data Quality
Reliable insights for health-related
decisions require high data quality
Openness of Data
Leadership to promote common
standards for Open Data: APIs,
format and schemas, as well as
covering licensing and legal
aspects
Data ownership
Fragmentation of data ownership
that leads to the data silo and
interoperability issues
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
32
TECHNOLOGY ROADMAP DEVELOPMENT
▶ Sector-specific data management technologies that
need to be in place before any big data scenario
can be realized
▶ Ensure that the relevant data of the sector is
available (e.g. EHR, IED)
▶ Driven by user/need driven (sector) analysis
▶ Focus on domain-specific requirements of data
management and related research questions
Enabling Technologies
▶ Large Gap between needs mentioned by the stakeholder and users of the (health) sector and the
technological opportunities envisioned by the technical groups
▶ Technical requirements mentioned within sectorial interviews mainly relate to efficient data
management approaches
▶ Technical opportunities mentioned by the technical groups highlight opportunities within a world
of open data access, such a the web
▶ Challenge: How to align the two perspectives?
▶ First Step “Focus on big data readiness”: Any technological requirements, such as efficient data
management, need to be addressed /solved before big data capabilities can be implemented
(enabling technologies)
▶ Second Step: “Elaborate big data opportunities”: Develop transitional scenarios that could be
realized assuming that the sector has achieved big data readiness (big data technologies), scenarios
should generate value.
Observations & Learnings from sector and technical investigations
▶ advanced/big IT capabilities that help to improve
healthcare delivery
▶ Data-driven: Investigate in public available
health data sources as basis for use case
brainstorming
▶ Technology-Driven: Investigate to which
extent applications /technologies from other
domains can be transferred to the healthcare
sector
Big data opportunities
We need to distinguish between
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
33
Domain Specific
Services
TOWARDS A BIG DATA ROADMAP
Secure Data
Sharing
Semantic
enrichment
Anonymisation
Scalable
Storage
Distributed Data
Acquistion
Scalable Storage
Analytical
Platforms
Analytical
Databases
Distributed Processing
Complex Event
Processing
Semantic Processing
Explorative Analysis
Energy
Public Sector
Health
Encryption
Manufacturing
Retail
Telco
Media & Entertainment
2nd JTC 1 SGBD Meeting - Workshop
BIG
Big Data Public Private Forum
34
SUMMARY
• BIG is contributing to the
creation of an industry-lead Big Data community in Europe
– Sector and Data Value Chain Analysis
– Initial results in SRIA
– Next steps: roadmap development, stakeholder platform
• 3 Data Sharing Use Cases Introduced
• Data Sharing is a key requirement for driving data driven-business on a
larger scale and evolve technologies
– Requires addressing issues across business (value), regulatory (society) and technical
(principled engineering and development of new capabilities)
– Initiatives such as PPP in combination with national will help address these issues
http://www.bigdatavalue.eu
2nd JTC 1 SGBD Meeting - Workshop
Thank you 
http://www.bigdatavalue.eu http://www.big-project.eu
Dr. Martin Strohbach
Senior Researcher, AGT International
Technical Lead Storage WG, BIG
mstrohbach@agtinternaitonal.com
Tilman Becker (DFKI, Data Usage), Edward Curry (NUI Galway, Data Curation), John
Domnique (STI, Data Analysis), Ricard Munné (ATOS, Public Sector), Sebnem
Rusitschka (Siemens, Energy and Transport), Holger Ziekow (AGT, PEC), Sonja Zillner
(Siemens, Health)

Contenu connexe

Tendances

Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing
Seungyun Lee
 
Big Data for Smart City
Big Data for Smart CityBig Data for Smart City
Big Data for Smart City
Koltiva
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data Presentation
Matthew Urdan
 

Tendances (20)

Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing
 
Study: #Big Data in #Austria
Study: #Big Data in #AustriaStudy: #Big Data in #Austria
Study: #Big Data in #Austria
 
Big Data for Smart City
Big Data for Smart CityBig Data for Smart City
Big Data for Smart City
 
Big data
Big dataBig data
Big data
 
Data 4 AI: For European Economic Competitiveness and Societal Progress
Data 4 AI: For European Economic Competitiveness and Societal ProgressData 4 AI: For European Economic Competitiveness and Societal Progress
Data 4 AI: For European Economic Competitiveness and Societal Progress
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big Data Trends - WorldFuture 2015 Conference
Big Data Trends - WorldFuture 2015 ConferenceBig Data Trends - WorldFuture 2015 Conference
Big Data Trends - WorldFuture 2015 Conference
 
Everis big data_wilson_v1.4
Everis big data_wilson_v1.4Everis big data_wilson_v1.4
Everis big data_wilson_v1.4
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
On Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challengesOn Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challenges
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
BIG DATA(PPT)
BIG DATA(PPT)BIG DATA(PPT)
BIG DATA(PPT)
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
NewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big DataNewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big Data
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data Presentation
 
Big Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesBig Data’s Big Impact on Businesses
Big Data’s Big Impact on Businesses
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
 
SC4 Workshop 1: Logistics and big data German herrero
SC4 Workshop 1: Logistics and big data  German herreroSC4 Workshop 1: Logistics and big data  German herrero
SC4 Workshop 1: Logistics and big data German herrero
 

En vedette

Big Data Public-Private Forum_General Presentation
Big Data Public-Private Forum_General PresentationBig Data Public-Private Forum_General Presentation
Big Data Public-Private Forum_General Presentation
BIG Project
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
Kanwal Prakash Singh
 
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev KumarApache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Yahoo Developer Network
 

En vedette (17)

Big Data Public-Private Forum_General Presentation
Big Data Public-Private Forum_General PresentationBig Data Public-Private Forum_General Presentation
Big Data Public-Private Forum_General Presentation
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 
Big data in India
Big data in IndiaBig data in India
Big data in India
 
Public policy in the ‘big data’ age: Gavin Freeguard introduction
Public policy in the ‘big data’ age: Gavin Freeguard introductionPublic policy in the ‘big data’ age: Gavin Freeguard introduction
Public policy in the ‘big data’ age: Gavin Freeguard introduction
 
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev KumarApache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
 
Public policy in the ‘big data’ age: Roeland Beerten presentation
Public policy in the ‘big data’ age: Roeland Beerten presentationPublic policy in the ‘big data’ age: Roeland Beerten presentation
Public policy in the ‘big data’ age: Roeland Beerten presentation
 
Hitachi kokusai electric inc. financial and strategic swot analysis review
Hitachi kokusai electric inc.    financial and strategic swot analysis reviewHitachi kokusai electric inc.    financial and strategic swot analysis review
Hitachi kokusai electric inc. financial and strategic swot analysis review
 
Big data Analytics opportunities in India
Big data  Analytics opportunities in IndiaBig data  Analytics opportunities in India
Big data Analytics opportunities in India
 
Public policy in the ‘big data’ age: Martin Ralphs presentation
Public policy in the ‘big data’ age: Martin Ralphs presentationPublic policy in the ‘big data’ age: Martin Ralphs presentation
Public policy in the ‘big data’ age: Martin Ralphs presentation
 
Barbara Ryan @OECD - 21 Sept 2015 - Water Policy in the Age of Big Data
Barbara Ryan @OECD - 21 Sept 2015 - Water Policy in the Age of Big DataBarbara Ryan @OECD - 21 Sept 2015 - Water Policy in the Age of Big Data
Barbara Ryan @OECD - 21 Sept 2015 - Water Policy in the Age of Big Data
 
SAS Forum India: Big Data, Big Analytics & Bad Behaviour - Fighting Financial...
SAS Forum India: Big Data, Big Analytics & Bad Behaviour - Fighting Financial...SAS Forum India: Big Data, Big Analytics & Bad Behaviour - Fighting Financial...
SAS Forum India: Big Data, Big Analytics & Bad Behaviour - Fighting Financial...
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
The data deluge: Five years on
The data deluge: Five years on The data deluge: Five years on
The data deluge: Five years on
 
Big data: Bringing competition policy to the digital era – Background note – ...
Big data: Bringing competition policy to the digital era – Background note – ...Big data: Bringing competition policy to the digital era – Background note – ...
Big data: Bringing competition policy to the digital era – Background note – ...
 
Hitachi Data Systems Big Data Roadmap
Hitachi Data Systems Big Data RoadmapHitachi Data Systems Big Data Roadmap
Hitachi Data Systems Big Data Roadmap
 
Big data: Bringing competition policy to the digital era – STUCKE – November ...
Big data: Bringing competition policy to the digital era – STUCKE – November ...Big data: Bringing competition policy to the digital era – STUCKE – November ...
Big data: Bringing competition policy to the digital era – STUCKE – November ...
 
Data strategy in a Big Data world
Data strategy in a Big Data worldData strategy in a Big Data world
Data strategy in a Big Data world
 

Similaire à Towards a big data roadmap for europe

BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
maigva
 
D2-5_BIG DATA EmilianoAnzellotti_v2 [
D2-5_BIG DATA EmilianoAnzellotti_v2 [D2-5_BIG DATA EmilianoAnzellotti_v2 [
D2-5_BIG DATA EmilianoAnzellotti_v2 [
Emiliano Anzellotti
 

Similaire à Towards a big data roadmap for europe (20)

Key Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeKey Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in Europe
 
New Horizons for a Data-Driven Economy – A Roadmap for Big Data in Europe
New Horizons for a Data-Driven Economy – A Roadmap for Big Data in Europe New Horizons for a Data-Driven Economy – A Roadmap for Big Data in Europe
New Horizons for a Data-Driven Economy – A Roadmap for Big Data in Europe
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
 
ppt1.pptx
ppt1.pptxppt1.pptx
ppt1.pptx
 
Proposal for the Theme on Big Data.pdf
Proposal for the Theme on Big Data.pdfProposal for the Theme on Big Data.pdf
Proposal for the Theme on Big Data.pdf
 
Towards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipTowards a BIG Data Public Private Partnership
Towards a BIG Data Public Private Partnership
 
Big Data Analytics (1).ppt
Big Data Analytics (1).pptBig Data Analytics (1).ppt
Big Data Analytics (1).ppt
 
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
 
Building blocks for fair digital society
Building blocks for fair digital societyBuilding blocks for fair digital society
Building blocks for fair digital society
 
Rdaeu russia_fg_1_july2014_final
Rdaeu  russia_fg_1_july2014_finalRdaeu  russia_fg_1_july2014_final
Rdaeu russia_fg_1_july2014_final
 
D2-5_BIG DATA EmilianoAnzellotti_v2 [
D2-5_BIG DATA EmilianoAnzellotti_v2 [D2-5_BIG DATA EmilianoAnzellotti_v2 [
D2-5_BIG DATA EmilianoAnzellotti_v2 [
 
IoT 2014 Value Creation Workshop: SDIL
IoT 2014 Value Creation Workshop: SDILIoT 2014 Value Creation Workshop: SDIL
IoT 2014 Value Creation Workshop: SDIL
 
Economic Challenges of Big Data
Economic Challenges of Big DataEconomic Challenges of Big Data
Economic Challenges of Big Data
 
Cross-Disciplinary Insights on Big Data Challenges and Solutions
Cross-Disciplinary Insights on Big Data Challenges and SolutionsCross-Disciplinary Insights on Big Data Challenges and Solutions
Cross-Disciplinary Insights on Big Data Challenges and Solutions
 
Big Data a Catalunya
Big Data a CatalunyaBig Data a Catalunya
Big Data a Catalunya
 
Big Data a Catalunya
Big Data a CatalunyaBig Data a Catalunya
Big Data a Catalunya
 
Europe rules – making the fair data economy flourish
Europe rules – making the fair data economy flourishEurope rules – making the fair data economy flourish
Europe rules – making the fair data economy flourish
 
13 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v313 pv-do es-18-bigdata-v3
13 pv-do es-18-bigdata-v3
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Towards a big data roadmap for europe

  • 1. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum TOWARDS A BIG DATA ROADMAP FOR EUROPE Martin Strohbach, AGT International Tilman Becker, Edward Curry, John Domnique, Ricard Munné, Sebnem Rusitschka, Sonja Zillner
  • 2. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 2 OVERVIEW • Big Data Public Private Forum and Partnership • 3 Data Sharing Use Cases • Methodology for creating a community based roadmap • Findings from the Use Cases, Sectors and Data Value Chain Analysis • Impact of the findings towards a European roadmap • Conclusions
  • 3. 2nd JTC 1 SGBD Meeting - Workshop THE EU PROJECT BIG BIG DATA PUBLIC PRIVATE FORUM Europe needs a clear strategy for leveraging Big Data Economy in Europe Work at technical, business and policy levels, shaping the future through the positioning of Big Data in Horizon 2020. Bringing the necessary stakeholders into a sustainable industry-led initiative, which will greatly contribute to enhance the EU competitiveness taking full advantage of Big Data technologies. Objectives Trigger Type of project: Coordination & Support Action Project start date: September 2012 Duration: 26 months Call: FP7-ICT-2011-8 Budget: 3,038 M€ Consortium: 11 partners Facts
  • 4. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 4 BIG DATA PRIVATE PUBLIC PARTNERSHIP • Objectives: „The goal […] is to increase the amount of productive European economic activities and the number of European jobs that depend on the availability of high quality data assets and the technologies needed to derive value from them.” (Source. Strategic Research and Innovation Agenda, SRIA) • Neelie Kroes, EU Commissioner for the Digital Agenda: „This is a revolution and I want the EU to be right at the front of it“ • 29 European organisations from industry and research • Results of public consultattion to be presented at NESSI Summit Brussels, 27th May
  • 5. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum SELECTED USE CASES
  • 6. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 6 OVERCOMING THE DATA SILO CULTURE IN PUBLIC SECTOR ID data Tax data Civil Registry Home Office Tax Agency Civil Registry Education Ministry Local Administration Data aggregation process Citizen Scholarship Registry
  • 7. 2nd JTC 1 SGBD Meeting - Workshop USE CASE HEALTHCARE SECONDARY USAGE OF HEALTH DATA ▶ Secondary usage of health data is defined as the aggregation, analysis and concise presentation of clinical, financial, administrative as well as other related health data ▶ in order to discover new valuable knowledge, for instance to identify trends, predict outcomes or influence patient care, drug development, or therapy choices. Description Example Use Cases ▶ Example 1: Patient recruiting and profiling suitable for conducting clinical studies. Today, often clinical studies, in particular studies investigating rare diseases, fail due to the fact that not enough patients are available for conducting clinical studies (Berliner Forschungsplattform Gesundheit (BFG), Astra Zeneca) ▶ Example 2: Social Media Based Medication Intelligence Mining of health data on the discover insights about side effects of medications (e.g. Treato)
  • 8. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 8 PEER ENERGY CLOUD Smart grid pilot in Saarlouis 100 households Berlin SaarlouisInnovation award Engage consumers to optimally use local solar energy  Understand consumption and save  Trade solar energy in the neighborhood to balance the grid
  • 9. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 9 DEVICE LEVEL ENERGY MONITORING Monitored/controlled grid today Monitored/controlled grid tomorrow Germany aims at 30% clean/renewable energy by 2020, seeking to build a smart grid Sensors today Sensors tomorrow (consumer level) Energy Consumption Temperature Movement,...
  • 10. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 10 GETTING READY FOR DATA VOLUMES IN FUTURE GRIDS PeerEnergyCloud Pilots allows us to get ready for future data volumes today How much data is really needed for what? 1 value per year 35.040 values per year today smart metering 540 million values per year PeerEnergy- Cloud ? Billion values per year Future possibilities Optimum? 7 devices per household every 2 seconds , 4-5 measurements per devices every 15 minutes real-time analytics on mass data (grouped aggregation) Scalable statistics over hundreds of millions of measurements Automatic detection of load anomalies (spotting inefficiencies and defects) Household activity state inference and prediction
  • 11. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum BIG METHODOLOGY
  • 12. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 12 SECTORIAL FORUMS AND TECHNICAL WORKING GROUPS Health Public Sector Finance & Insurance Telco, Media& Entertainment Manufacturing, Retail, Energy, Transport Needs Offerings Big Data Value Chain Technical Working Groups Industry Driven Sectorial Forums Data Acquisition Data Analysis Data Curation Data Storage Data Usage • Structured data • Unstructured data • Event processing • Sensor networks • Protocols • Real-time • Data streams • Multimodality • Structured data • Unstructured data • Event processing • Sensor networks • Protocols • Real-time • Data streams • Multimodality • Stream mining • Semantic analysis • Machine learning • Information extraction • Linked Data • Data discovery • ‘Whole world’ semantics • Ecosystems • Community data analysis • Cross-sectorial data analysis • Stream mining • Semantic analysis • Machine learning • Information extraction • Linked Data • Data discovery • ‘Whole world’ semantics • Ecosystems • Community data analysis • Cross-sectorial data analysis • Data Quality • Trust / Provenance • Annotation • Data validation • Human-Data Interaction • Top-down/Bottom-up • Community / Crowd • Human Computation • Curation at scale • Incentivisation • Automation • Interoperability • Data Quality • Trust / Provenance • Annotation • Data validation • Human-Data Interaction • Top-down/Bottom-up • Community / Crowd • Human Computation • Curation at scale • Incentivisation • Automation • Interoperability • In-Memory DBs • NoSQL DBs • NewSQL DBs • Cloud storage • Query Interfaces • Scalability and Performance • Data Models • Consistency, Availability, Partition- tolerance • Security and Privacy • Standardization • In-Memory DBs • NoSQL DBs • NewSQL DBs • Cloud storage • Query Interfaces • Scalability and Performance • Data Models • Consistency, Availability, Partition- tolerance • Security and Privacy • Standardization • Decision support • Prediction • In-use analytics • Simulation • Exploration • Visualisation • Modeling • Control • Domain-specific usage • Decision support • Prediction • In-use analytics • Simulation • Exploration • Visualisation • Modeling • Control • Domain-specific usage
  • 13. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 13 SECTOR ANALYSIS METHODOLOGY
  • 14. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 14 TECHNICAL WORKGROUP APPROACH Senior Academic Senior Management Middle Researcher Middle Management Position in Organisation University MNC SME Other Types of Organisations 1. Literature & Technical Survey 2. Subject Matter Expert Interviews 3. Stakeholder Workshops 4. Online Questionnaire (with NESSI) • Early adopters • Business enablement • Technical maturity • Key Opinion Leaders Methodology Interviewee Breakdown Target Interviewee
  • 15. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum TWG FINDINGS
  • 16. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 16 KEY INSIGHTS Key Trends ▶ Lower usability barrier for data tools ▶ Blended human and algorithmic data processing for coping with for data quality ▶ Leveraging large communities (crowds) ▶ Need for semantic standardized data representation ▶ Significant increase in use of new data models (i.e. graph) (expressivity and flexibility) ▶ Much of (Big Data) technology is evolving evolutionary ▶ But business processes change must be revolutionary ▶ Data variety and verifiability are key opportunities ▶ Long tail of data variety is a major shift in the data landscape The Data Landscape ▶ Lack of Business-driven Big Data strategies ▶ Need for format and data storage technology standards ▶ Data exchange between companies, institutions, individuals, etc. ▶ Regulations & markets for data access ▶ Human resources: Lack of skilled data scientists and data engineers Biggest Blockers
  • 17. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 17 DATA VALUE CHAIN - ANALYSIS Key Insights  Old technologies applied in a new context (Volume, Variety, Velocity)  Need for  Stream data mining  ‘Good’ data discovery  Techniques to deal with dealing with both very broad and very specific data  Features to Increase Take-up  Simplicity including the ‘democratisation of semantic technologies’  Ecosystems of tools  Communities and Big Data will be involved in new and interesting relationships  In collection, improving data accuracy, analysis and usage  Improves community engagement  Cross-sectorial uses of Big Data will open up new business opportunities Data Acquisition Data Analysis Data Curation Data Storage Data Usage Social & Economic Impacts • Cross-sectorial businesses  Data-cleaning today requires a lot of work – area for new business  eHealth transformed from push (patient visits doctor) to pull (continuous monitoring)  Also large scale trend analysis  Conduit between research and individual patient care  E.g. creation of patient avatars based on combination of specific data and generic research data  Proactive fraud detection even before it happens  Engaged citizens create and have access to all data related to local, regional and national social and policy issues
  • 18. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 18 DATA VALUE CHAIN - CURATION State of the Art Data Acquisition Data Analysis Data Curation Data Storage Data Usage Use Case  Master Data Management (MDM)  Centralized, single point of reference for data representation  Collaboration platforms & Web 2.0 tools  Wikis, Content Management Systems (CMS)  Early-stage crowdsourcing services  E.g. Amazon Mechanical Turk  Health and Life Sciences  ChemSpider: Collaborative platform for data curation of chemical structures  Protein Data Bank: Data curation platform for 3D protein structures  FoldIt: Crowdsourcing game platform for protein folding  Telco, Media, Entertainment  Press Association, Thomson Reuters, The New York Times  Use of semantic technologies to structure and categorize unstructured data (texts, images and videos), improving content accessibility and reuse  Retail  Ebay: use of crowdsourcing services to improve product categorization  Unilever: use of crowdsourcing services for product feedback and sentiment analysis
  • 19. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 19 DATA VALUE CHAIN - CURATION Future Requirements Data Acquisition Data Analysis Data Curation Data Storage Data Usage Emerging Trends  Creation of incentives mechanisms for the maintenance and publication of curated datasets  Definition of models for the data economy  Understanding of social engagement mechanisms  Reduction of the cost associated with the data curation task (scalability)  Improvement of the Human-Data interaction aspects. Enabling domain experts and casual users to query, explore, transform and curate data  Inclusion of trustworthiness mechanisms in data curation  Integration and interoperability between data curation platforms / Standardization  Investigation of theoretical and domain specific models for data curation  Better integration between unstructured and structured data and tools Incentives and social engagement  Better recognition of the data curation role  Understanding of social engagement mechanisms Economic Models  Pre-competitive and public-private partnerships Curation at Scale  Evolution human computation and crowdsourcing  Instrumenting popular apps for data curation  General-purpose data curation pipelines  Human-data interaction Trust  Capture of data curation decisions & provenance management  Fine-grained permission management models and tools Standardization & interoperability  Standardized data model and vocabularies  Better integration between data curation tools Data Curation Models  Minimum information models  Nanopulications  Theoretical principles and domain-specific model
  • 20. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 20 DATA VALUE CHAIN - STORAGE Key Insights From state-of-the art Capability to store and manage virtually unbounded volumes of data has huge potential to transform society and business Better Scalability at Lower Operational Complexity and Costs Big Data Storage Has Become a Commodity Business. Maturity Has Reached Enterprise-Grade Level Open Challenges Unclear Adoption Paths for Non-IT Based Sectors Lack of standards and best practices is major barrier for adoption Open Scalability Challenges Privacy and Security is Lacking Behind Data Acquisition Data Analysis Data Curation Data Storage Data Usage Social & Economic Impacts  Enables a truly data-driven economy that is able to manage a variety of data sets in an integrated way  Privacy and Security becomes more important  Energy: high resolution smart meter data contributes to the stable integration of renewable energies  Health Care: storage is key enabling technology to solve integration challenges  Media: enables using the voice of the crowd  Transport: facilitates personalized and multi- modal on large scale
  • 21. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 21 DATA VALUE CHAIN - STORAGE Future Requirements Data Acquisition Data Analysis Data Curation Data Storage Data Usage Emerging Trends  Standardization of Query Interfaces  Legal Frameworks for data management  Data Tracing and provenance in order to assess trust in data and ensure compliance  Better support and scalability for storing semantic data models, e.g. in health care  Graph databases will become more important  Columnar stores experience wider adoptions as they are often faster in most practical applications  Convergence with analytical frameworks Analytical databases for better performance and lower development complexity (Mahout, Spark, Hadoop/R, rasdaman, SciDB)  Data Hubs and Markets: Hadoop-based solutions tend to become a central integration point for all enterprise, sectorial and cross- sectorial data  Smart Data: only use relevant data in network and at the edge processessing
  • 22. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 22 DATA VALUE CHAIN - USAGE Key Insights Data Acquisition Data Analysis Data Curation Data Storage Data Usage Social & Economic Impacts  Key task of Data Usage is to support business decisions  There is great variety of Big Data Applications. Some key areas are:  Industry 4.0 (industrial internet)  Predictive maintenance  Smart data and service integration  Interactive exploration: Big Data generates insights beyond existing models, new analysis interfaces must support browsing and modeling (visual analytics)  Opportunities for data markets: integration with customer and supplier data, services from infrastructure (Iaas) to software (SaaS) to business processes (BPaaS) to knowledge (KaaS)  Regulatory issues:  Data protection and privacy  Ownership of derived data  Integration of business processes and data  Beyond single company; data market places  Transparency through Data Usage impacts:  Economy  Society  Privacy  Current drivers are big companies  Opportunities for SMEs though services (integration) and data market places
  • 23. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum USE CASE ANALYSIS
  • 24. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 24 PUBLIC SECTOR • What the Public sector is facing… – Internal data silos → no integration of systems across public bodies – Raising pressure on productivity → lacks productivity compared to the private sector – Older workforce → compared to the private sector, relies on a far older workforce – Aging population → increasing demand for medical and social services • …and in what can Big data help: – Improving many areas in public sector services:  Strengthen collaboration among PS bodies  Social welfare  Citizen services  Improve healthcare  Government transparency and accountability  Public safety – Open Data initiatives → act as catalysts in the development of a data ecosystem through the opening of their own datasets, and actively managing their dissemination and use – Analysis of sensor data: Real time data processing to search for patterns and relationships and present real-time views on Smart Cities for better urban management and citizen engagement
  • 25. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 25 Data Privacy and Security Legal framework supporting data access and usage of citizen’s information PUBLIC SECTOR REQUIREMENTS OVERCOMING THE DATA SILO CULTURE Data ownership Fragmentation of data ownership that leads to the data silo and interoperability issues Data Sharing Overcome data silos because of legacy technologies Common strategy at all levels of administration So much energy is lost and will remain so until a common strategy is realized for the reuse of cross technology platforms Not-Technology-related Regulation & Technology Technology-related Legislative and political willingness To promote legislation that allows the reuse of data for other purposes than those for what it was originally collected Openness of Data Leadership to promote common standards for Government Open Data: APIs, format and schemas, as well as for open data licensing
  • 26. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 26 Data Security Legal processes for data sharing & communication are needed Data Quality Reliable insights for health-related decisions require high data quality HEALTHCARE REQUIREMENTS High Investment Long-term investments require conjoint engagement of several partners Data Digitalization only small percentage of data is documented (lack of time) with low quality Semantic Annotation transform unstructured data into structured format Data Sharing Overcome data silos and inflexible interfaces Business Cases Undiscovered und unclaimed potential business values Value-based system incentives Current incentives enforce “high number” instead of “high quality” of care services Not-Technology-related Regulation & Technology Technology-related
  • 27. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 27 DATA POOLS IN HEALTHCARE MAIN IMPACT BY INTEGRATING VARIOUS AND HETEROGENEOUS DATA SOURCES Clinical Data  Owned by providers (such as hospitals, care centers, physicians, etc.)  Encompass any information stored within the classical hospital information systems or EHR, such as medical records, medical images, lab results, genetic data, etc. Claims, Cost & Administrative Data  Owned by providers and payors  Encompass any data sets relevant for reimbursement issues, such as utilization of care, cost estimates, claims, etc. Pharmaceutical & R&D Data  Owned by the pharmaceutical companies, research labs/academia, government  Encompass clinical trials, clinical studies, population and disease data, etc. Patient Behaviour & Sentiment Data  Owned by consumers or monitoring device producer  Encompass any information related to the patient behaviours and preferences Health data on the web  Mainly open source  Examples are websites such as PatientLikeMe, Linked Open Data, etc. Highest Impact on integrated data sets
  • 28. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 28 ENERGY SECTOR REQUISITES STAKEHOLDERS, USE CASES, DATA SOURCES TSO1 TSO2 DSO11 DSO12 DSO21 price signals renewables parks prosumer weather e-car roaming sustainable connected cities industrial DSM New business requires the combined value creation on mass data coming from a variety of data sources, once the volume and velocity of energy data is mastered Flexible energy tariffs for prosumers and eCars • Actual energy usage instead standardized profiles Efficient portfolio management • detailed data on market prices, • actual energy usage, feed-in • weather conditions Increased situational awareness • intermittent, decentralized supply • new responsive demand • network asset and weather conditions TSO – Transmission System Operator DSO – Distribution System Operator NETWORK OPERATORS RETAIL/WHOLESALE END USER ENERGY MANAGEMENT
  • 29. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 29 A digitally united European Union European stakeholders require reliable minimally consistent rules and regulation regarding digital rights and regulations, whilst making use of all technological enhancements Data Access Dynamic configurability of data access of which data can be collected for what purpose in what granularity and time span and location Privacy and confidentiality preserving data analytics are required to enable the service provider to retrieve the knowledge without violating the agreed upon granularity ENERGY SECTOR REQUIREMENTS Investment in communication and connectedness Broadband communication or ICT in general, needs to be widely available alongside energy and transportation infrastructure such that real-time data access is given Abstraction from the actual big data infrastructure is required to enable (a) ease of use and (b) extensibility and flexibility Adaptive models of data and system model new knowledge extracted from domain analytics or the ever changing circumstances of the system can be redeployed into the data analytics framework Data interpretability & analytics Expert and domain know-how must be blended into the data management and analytics. Data analytics is required as part of every step from data acquisition to data management to data usage. Fast and even real-time analytics is required to support decisions, which need to be made in ever shorter time spans Skilled people programming, statistical tools for example needs to be part of engineering education until big data technology becomes more business user-friendly or in case this becomes the “new normal” Standardization & Open Data Open data is a great opportunity, but, standardization is required. Regarding data models, representation, as well as protocols practical migration paths Not-Technology-related Regulation & Technology Technology-related
  • 30. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum CONCLUSIONS
  • 31. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 31 (SOME) COMMON REQUIREMENTS Data Privacy and Security Legal frameowrks for data sharing & communication are needed High Investment Long-term investments require conjoint engagement of several partners Not-Technology-related Data Digitalization only small percentage of data is documented (lack of time) with low quality Semantic Annotation transform unstructured data into structured format Data Sharing Overcome data silos and inflexible interfaces Business Cases Undiscovered und unclaimed potential business values Regulation & Technology Technology-related Data Quality Reliable insights for health-related decisions require high data quality Openness of Data Leadership to promote common standards for Open Data: APIs, format and schemas, as well as covering licensing and legal aspects Data ownership Fragmentation of data ownership that leads to the data silo and interoperability issues
  • 32. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 32 TECHNOLOGY ROADMAP DEVELOPMENT ▶ Sector-specific data management technologies that need to be in place before any big data scenario can be realized ▶ Ensure that the relevant data of the sector is available (e.g. EHR, IED) ▶ Driven by user/need driven (sector) analysis ▶ Focus on domain-specific requirements of data management and related research questions Enabling Technologies ▶ Large Gap between needs mentioned by the stakeholder and users of the (health) sector and the technological opportunities envisioned by the technical groups ▶ Technical requirements mentioned within sectorial interviews mainly relate to efficient data management approaches ▶ Technical opportunities mentioned by the technical groups highlight opportunities within a world of open data access, such a the web ▶ Challenge: How to align the two perspectives? ▶ First Step “Focus on big data readiness”: Any technological requirements, such as efficient data management, need to be addressed /solved before big data capabilities can be implemented (enabling technologies) ▶ Second Step: “Elaborate big data opportunities”: Develop transitional scenarios that could be realized assuming that the sector has achieved big data readiness (big data technologies), scenarios should generate value. Observations & Learnings from sector and technical investigations ▶ advanced/big IT capabilities that help to improve healthcare delivery ▶ Data-driven: Investigate in public available health data sources as basis for use case brainstorming ▶ Technology-Driven: Investigate to which extent applications /technologies from other domains can be transferred to the healthcare sector Big data opportunities We need to distinguish between
  • 33. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 33 Domain Specific Services TOWARDS A BIG DATA ROADMAP Secure Data Sharing Semantic enrichment Anonymisation Scalable Storage Distributed Data Acquistion Scalable Storage Analytical Platforms Analytical Databases Distributed Processing Complex Event Processing Semantic Processing Explorative Analysis Energy Public Sector Health Encryption Manufacturing Retail Telco Media & Entertainment
  • 34. 2nd JTC 1 SGBD Meeting - Workshop BIG Big Data Public Private Forum 34 SUMMARY • BIG is contributing to the creation of an industry-lead Big Data community in Europe – Sector and Data Value Chain Analysis – Initial results in SRIA – Next steps: roadmap development, stakeholder platform • 3 Data Sharing Use Cases Introduced • Data Sharing is a key requirement for driving data driven-business on a larger scale and evolve technologies – Requires addressing issues across business (value), regulatory (society) and technical (principled engineering and development of new capabilities) – Initiatives such as PPP in combination with national will help address these issues http://www.bigdatavalue.eu
  • 35. 2nd JTC 1 SGBD Meeting - Workshop Thank you  http://www.bigdatavalue.eu http://www.big-project.eu Dr. Martin Strohbach Senior Researcher, AGT International Technical Lead Storage WG, BIG mstrohbach@agtinternaitonal.com Tilman Becker (DFKI, Data Usage), Edward Curry (NUI Galway, Data Curation), John Domnique (STI, Data Analysis), Ricard Munné (ATOS, Public Sector), Sebnem Rusitschka (Siemens, Energy and Transport), Holger Ziekow (AGT, PEC), Sonja Zillner (Siemens, Health)