In this presentation we will discuss some of the results of the BIG project including analysis of foundational Big Data research technologies, technology and strategy roadmaps to enable business to understand the potential of Big Data technologies across different sectors, and the necessary collaboration and dissemination infrastructure to link technology suppliers, integrators and leading user organizations.
Edward Curry is leading the Technical Working Group of the BIG Project with over 30 committed experts along the big data value chain (Acquisition, Analysis, Curation, Storage, Usage). With the help of the other technical leads, he will elaborate on the key technology trends identified in the BIG Project and how they bring data-driven value to industrial sectors.
1. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
KEY TECHNOLOGY TRENDS
FOR BIG DATA IN EUROPE
Edward Curry, Insight @ NUI Galway
Tilman Becker, Andre Freitas, John Domnique, Helen
Lippell, Felicia Lobillo, Ricard Munné, Axel Ngonga,
Denise Paradowski, Sebnem Rusitschka, Holger
Ziekow, Martin Strohbach, Sonja Zillner, and all the
many many contributors to the Technical Working
Groups and Sectorial Forums
2. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
2
OVERVIEW
Business Context Methodology
Value-Driven Use Case Technology Trends
3. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
BUSINESS CONTEXT
4. “This is a revolution: and I want
the EU to be right at the front of
it.”
Neelie Kroes, Vice-President of the
European Commission responsible for
the Digital Agenda, March 2013
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
4
BIG DATA IN EUROPE
“Possibly one of the few last
chances for Europe‘s software
industry to take a true leadership
“
K-H Streibich, CEO
5. Open Innovation Open Data
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
5
INCREASED OPENNESS
Ecosystems Approaches
Community-based
Tools and Data
6. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
BIG METHODOLOGY
7. Industry Driven Sectorial Forums
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
7
SECTORIAL FORUMS AND TECHNICAL
WORKING GROUPS
Health Public Sector Finance &
Insurance
Telco, Media&
Entertainment
Manufacturing,
Retail, Energy,
Transport
Needs Offerings
Big Data Value Chain
Technical Working Groups
Data
Acquisition
Data
Analysis
Data
Curation
Data
Storage
Data
Usage
• Structured data
• Unstructured data
• Event processing
• Sensor networks
• Protocols
• Real-time
• Data streams
• Multimodality
• Stream mining
• Semantic analysis
• Machine learning
• Information
extraction
• Linked Data
• Data discovery
• ‘Whole world’
semantics
• Ecosystems
• Community data
analysis
• Cross-sectorial data
analysis
• Data Quality
• Trust / Provenance
• Annotation
• Data validation
• Human-Data
Interaction
• Top-down/Bottom-up
• Community / Crowd
• Human Computation
• Curation at scale
• Incentivisation
• Automation
• Interoperability
• In-Memory DBs
• NoSQL DBs
• NewSQL DBs
• Cloud storage
• Query Interfaces
• Scalability and
Performance
• Data Models
• Consistency,
Availability, Partition-tolerance
• Security and Privacy
• Standardization
• Decision support
• Prediction
• In-use analytics
• Simulation
• Exploration
• Visualisation
• Modeling
• Control
• Domain-specific
usage
8. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
8
SECTORIAL ANALYSIS METHODOLOGY
9. Middle
Management
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
9
TECHNICAL WORKGROUP APPROACH
Senior
Academic
Senior
Management
Middle
Researcher
Position
in
Organisation
University
MNC
SME
Other
Types
of
Organisations
1. Literature & Technical Survey
2. Subject Matter Expert Interviews
3. Stakeholder Workshops
4. Online Questionnaire (with
NESSI)
• Early adopters
• Business enablement
• Technical maturity
• Key Opinion Leaders
Methodology
Interviewee Breakdown
Target Interviewee
10. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
10
SUBJECT MATTER EXPERT INTERVIEWS
11. Expert Interviews Technical Whitepapers
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
▶ Executive Overview
▶ Key Insights
▶ Social & Economic
Impact
▶ Concise State of the Art
▶ Future Requirements &
Emerging Trends
▶ Sector-specific Case
Studies
11
WORKING GROUP RESULTS
Interviews, Technical White Papers, Sector's requisites
and Roadmaps available on: http://www.big-project.eu
12. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
VALUE-DRIVEN USE CASE
13. Public Service
Integration
with Open Data Retail
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
13
VALUE-DRIVEN USE CASES
Health Public Sector Finance &
Insurance
Telco, Media&
Entertainment
Manufacturing,
Retail, Energy,
Transport
Industry Driven Sectorial Forums
Industry 4.0
Increasing
Productivity of
Wind Farms
Data Markets
Data-Driven
Therapy
Guidance
14. Technology Evolution
Process Revolution
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
14
THE DATA LANDSCAPE (1/2)
▶ Much of Big Data technology is evolving
evolutionary
▶ Old technologies applied in a new context
▶ Volume, Variety, Velocity, Value …
▶ Business processes change must be
revolutionary to enable new opportunities
▶ Industry 4.0 (industrial internet)
▶ Predictive maintenance
▶ Opportunities for data-driven improvements
▶ integration with customer and supplier data
▶ Moving from infrastructure services (IaaS) to
software (SaaS) to business processes (BPaaS) to
knowledge (KaaS)
15. Variety and Reuse
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
15
THE DATA LANDSCAPE (2/2)
▶ The long tail of data variety is a major shift in
the data landscape
▶ Coping with data variety and verifiability are
central challenges and opportunities for Big Data
▶ Cross-sectorial uses of Big Data will open up
new business opportunities
▶ Need for scalable approaches to cope with data
under different format and semantic assumptions
16. Secondary Usage of Health Data
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
16
REUSE OF HEALTH DATA
▶ Aggregation, analysis and presentation of clinical, financial,
administrative and other related data
▶ Goal is to discover new valuable knowledge
▶ Identify trends, predict outcomes or influence patient care,
drug development, or therapy choices
▶ Patient recruiting & profiling for conducting clinical studies
17. Pharmaceutical &
R&D Data
§ Owned by the pharmaceutical
companies, research labs/
academia, government
§ Encompass clinical trials,
clinical studies, population and
disease data, etc.
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
17
DATA POOLS IN HEALTHCARE
MAIN IMPACT BY INTEGRATING VARIOUS AND
HETEROGENEOUS DATA SOURCES
Clinical Data
§ Owned by providers (such as
hospitals, care centers, physicians,
etc.)
§ Encompass any information stored
within the classical hospital
information systems or EHR, such as
medical records, medical images, lab
results, genetic data, etc.
Claims, Cost &
Administrative Data
§ Owned by providers and payors
§ Encompass any data sets relevant for
reimbursement issues, such as
utilization of care, cost estimates,
claims, etc.
Patient Behaviour &
Sentiment Data
§ Owned by consumers
or monitoring device
producer
§ Encompass any
information related to
the patient behaviours
and preferences
Health data on the
web
§ Mainly open source
§ Examples are
websites such as
PatientLikeMe,
Linked Open Data,
etc.
Highest Impact
on integrated data sets
18. Dr. Martin Strohbach
Senior Researcher
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
PEER ENERGY CLOUD
19. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
19
PEER ENERGY CLOUD
Smart grid pilot in Saarlouis
100 households
Berlin
Innovation award Saarlouis
Engage consumers to optimally
use local solar energy
§ Understand consumption and
save
§ Trade solar energy in the
neighborhood to balance
the grid
20. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
20
DEVICE LEVEL ENERGY MONITORING
Monitored/controlled grid today
Monitored/controlled grid tomorrow
Germany aims at 30% clean/
renewable energy by 2020,
seeking to build a smart grid
Sensors
today
Sensors
tomorrow
(consumer
level)
Energy
Consumption
Temperature
Movement,...
21. 35.040 values
per year
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
21
GETTING READY FOR DATA VOLUMES
IN FUTURE GRIDS
PeerEnergyCloud Pilots allows us to get ready for future data
volumes today
How much data is really needed
for what?
1 value per
year
today smart
metering
540 million
values per year
? Billion values
per year
PeerEnergy-
Cloud
Future
possibilities
Optimum?
7 devices per
household every
2 seconds , 4-5
measurements
per devices
every 15
real-time analytics minutes
on mass data (grouped
aggregation)
Scalable statistics
over hundreds of millions
of measurements
Automatic detection
of load anomalies
(spotting inefficiencies
and defects)
Household activity
state inference and
prediction
22. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
22
IDENTIFIED NEEDS FOR
DEVICE LEVEL MONITORING
Managing Large Data
RDBMs didn‘t easily support our data volumes as well as Hadoop did
Real-time Insights
E.g. for forecasting energy demand and anomaly detections is required to make
efficient decisions
Data Security and Privacy
Privacy and confidentiality preserving data analytics are required to enable the
service provider to retrieve the knowledge without violating the agreed upon
granularity, in PEC this was realized by dynamic configurability of data
access( which data, what purpose, what granularity, …)
Ease of use
Simplifications of applying machine learning techniques on Big Data sets would
help speeding up development, e.g. unified batch/stream abstractions,
standardized data integration, visualization tools
23. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
KEY TECHNOLOGY TRENDS
24. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
24
THE DATA VALUE CHAIN
Data
Acquisition
Data
Analysis
Data
Curation
Data
Storage
Data
Usage
• Structured data
• Unstructured
data
• Event
processing
• Sensor
networks
• Protocols
• Real-time
• Data streams
• Multimodality
• Stream mining
• Semantic
analysis
• Machine
learning
• Information
extraction
• Linked Data
• Data discovery
• ‘Whole world’
semantics
• Ecosystems
• Community data
analysis
• Cross-sectorial
data analysis
• Data Quality
• Trust / Provenance
• Annotation
• Data validation
• Human-Data
Interaction
• Top-down/Bottom-up
• Community /
Crowd
• Human
Computation
• Curation at scale
• Incentivisation
• Automation
• Interoperability
• In-Memory DBs
• NoSQL DBs
• NewSQL DBs
• Cloud storage
• Query Interfaces
• Scalability and
Performance
• Data Models
• Consistency,
Availability,
Partition-tolerance
• Security and
Privacy
• Standardization
• Decision support
• Predictions
• In-use analytics
• Simulation
• Exploration
• Modeling
• Control
• Domain-specific
usage
Big Data Value Chain
• Technical working groups examine the the state of the art and future developments in big
data across the whole value chain of big data:
• Working groups publish Technical white papers that result from desktop research and in-depth
interviews with leading experts.
25. IMPROVING USABILITY
Usability
▶ Lowering the usability barrier for data tools: Users should
be able to directly manipulate the data
▶ Improvement of Human-Data interaction: Enabling experts
& casual users to query, explore, transform, & curate data
▶ Interactive exploration: Big Data generates insights beyond
existing models, new analysis interfaces must support browsing
and modeling (visual analytics)
▶ Convergence within
analytical frameworks
Analytical databases for better
performance and lower
development complexity
(Mahout, Spark, Hadoop/R,
rasdaman, SciDB)
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
25
26. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
26
BLENDING HUMAN AND ALGORITHM
Blended Approaches
▶ Blended human and algorithmic data processing
approaches for coping with data acquisition, transformation,
curation, access, and analysis challenges for Big Data
Analytics &
Algorithms
Entity Linking
Data Fusion
Relation Extraction
Human
Computation
Relevance Judgment
Data Verification
Disambiguation
Better Data
Internal Community
- Domain Knowledge
- High Quality Responses
- Trustable
Web Data
Databases
Sensor Data
Programmers Managers
External Crowd
- High Availability
- Large Scale
- Expertise Variety
27. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
27
A CROSS-SECTOR TREND…
Telco, Media, & Entertainment
Manufacturing, Retail, Energy & Transport
Public Sector Life Sciences
28. Ecosystems are Important
▶ Community provided data (crowd-based collection, data
quality, analysis and usage)
▶ Community tools which are interoperable and usable
▶ Support from large communities or large companies
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
28
COMMUNITY AND ECOSYSTEMS
Community
▶ Solutions based on large communities (crowd-based
approaches) and Ecosystems are emerging as a trend to
cope with Big Data challenges
Emerging Economic Model for Open Data
▶ Pre-competitive collaboration efforts
▶ Pistoia Alliance (pharmaceutical data)
▶ Share costs, risks and technical challenges
▶ Benefit from collective wisdom and network
effect for curated dataset
29. COMMUNITY DATA
Community Analysis and Collection
§ Number of data collection points can be dramatically increased;
§ Communities are creating bespoke tools for the particular situation and to
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
29
handle any problems in data collection (Developer Ecosystem)
§ Citizen engagement is increased significantly
Real-time City Noise Levels radiation monitoring
30. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
30
STANDARDS
Standardization & interoperability
▶ Principled semantic and standardized data representation
models are central to cope with data heterogeneity
▶ Minimum information models needed
▶ Significant increase in the use of new data models (i.e. graph-based)
(expressivity and flexibility)
▶ Better integration between data tools
▶ Standardization of Query Interfaces
!
source: TU Berlin, FG DIMA 2013
Open Open Challenges
Technology Stacks
• Unclear Adoption Paths for
Non-IT Based Sectors
• Lack of standards and
best practices is major
barrier for adoption
• Privacy and Security is
Lacking Behind
31. BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
31
END-TO-END ARCHITECTURES
Architectures
▶ Design end-to-end architectures for full data lifecycle
▶ Support for both “Data-at-Rest” and “Data-in-Motion”
▶ Data Hubs and Markets: Hadoop-based solutions tend to
become central integration point for all enterprise data
32. Key Technical Requirements
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
32
BIGGEST BLOCKERS
▶ Lack of Business-driven Big Data strategies
▶ Undiscovered und unclaimed potential business
values
▶ Data Sharing & Exchange
▶ Need for format and data storage technology
standards
▶ Data Privacy and Security
▶ Regulations & markets for data access
▶ Legal frameworks for data sharing &
communication are needed
▶ Human resources
▶ Lack of skilled data scientists and data
engineers
33. The Data Landscape
▶ Much of (Big Data) technology
is evolving evolutionary
▶ But business processes change
must be revolutionary
▶ Data variety and verifiability
are key opportunities
▶ Long tail of data variety is a
major shift in the data landscape
BIG Final Event Workshop - September 30, 2014 - Heidelberg
BIG
Big Data Public Private Forum
Biggest Blockers
▶ Lack of Business-driven Big Data
strategies
▶ Need for format and data storage
technology standards
▶ Data exchange between
companies, institutions, individuals,
etc.
▶ Regulations & markets for data
access
▶ Human resources: Lack of skilled
data scientists and data
engineers
33
KEY INSIGHTS
Key Trends
▶ Lower usability barrier for data tools
▶ Blended human and algorithmic data processing for coping with
for data quality
▶ Leveraging large communities (crowds)
▶ Need for semantic standardized data representation
▶ Significant increase in use of new data models (i.e. graph)
(expressivity and flexibility)
34. Thank
you
Dr. Edward Curry
Research Fellow,
Insight @ NUI Galway.
ed.curry@insight-centre.org
Interviews, Technical White
Papers, Sector's requisites and
Roadmaps available on:
http://www.big-project.eu
Tilman Becker (DFKI, Data Usage), Andre Freitas (NUI Galway, Data Curation),
John Domnique (STI, Data Analysis), Helen Lippell (Press Association, Media),
Felicia Lobillo (ATOS, Retail), Ricard Munné (ATOS, Public Sector), Axel
Ngonga (InfAI, Data Acquisition), Denise Paradowski (DFKI, Retail), Sebnem
Rusitschka (Siemens, Energy and Transport), Holger Ziekow (AGT, PEC),
Martin Strohbach (AGT, Data Storage), Sonja Zillner (Siemens, Health), and all
the many many contributors to the Technical Working Groups and Sectorial
Forums
http://www.bigdatavalue.eu http://www.big-project.eu
BIG Final Event Workshop - September 30, 2014 - Heidelberg