Talk to Public Sector officials in Spain about Technological trends in Big Data. Aimed to create awareness of the BIG project (http://www.big-project.eu/) and get feedbak for the sector roadmap
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_technological_trends
1. Big Data Public Private Forum
BIG DATA
TENDENCIAS TECNOLÓGICAS
First workshop for the construction of a Roadmap for Big
Data in Europe
16/04/2013
Tomás Pariente – Atos Research and Innovation /
BIG Project
2. WE KNOW WHAT BIG DATA IS, BIG
Big Data Public Private Forum
RIGHT?
Hi, big
Hi, big
brother
brother
BIG
Small
DATA
DATA
Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 2 BIG 318062
3. BIG DATA IS NOT ONLY ABOUT SIZE: BIG
DATA DIVERSITY MATTERS
Big Data Public Private Forum
3 Vs: Volume, Velocity, Variety
BIG SIZE
UNSTRUCTURED
MULTIMEDIA
+ =
REAL TIME
BIG
“EXHAUST”
LINKED/SHARED
Traditional
DATA
Structured
Data SOCIAL
OPEN
Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 3 BIG 318062
4. THE DATA DELUGE BIG
Big Data Public Private Forum
EXPONENTIAL DATA GROWTH
IBM: “Every day, we create 2.5
quintillion bytes of data- so much that “Data is the new gold”1
90% of the data in the world today has
been created in the last two years alone.” Data
mgmt
Big Data definition
When dealing with data becomes The problem
Data
1 Neelie Kroes Vice-President of the European Commission responsible for the Digital Agenda
Data
Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 4 BIG 318062
5. BIG DATA BIG
TECHNOLOGIES LANDSCAPE
Big Data Public Private Forum
Batch processing
Batch processing
Real-time processing
Real-time processing
Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 5 BIG 318062
6. BIG DATA STORAGE BIG
NOSQL AND BEYOND
Big Data Public Private Forum
• Distributed File Systems:
– Hadoop File System (HDFS).
– Capability to store large amount of unstructured data in a reliable way on
commodity hardware.
• NoSQL Databases:
– Use other data models than the relational model known from the SQL world
– Do not necessarily adhere to transactional properties of atomicity, consistency
and isolation and durability (ACID).
• NewSQL Databases: Shorthand for new scalable/high-performance SQL DBs.
– SQL as the primary mechanism for application interaction
– ACID support for transactions
– A non locking concurrency control mechanism
– An architecture providing much higher per-node performances
– A scale out, shared-nothing architecture, capable of running on a large number
of nodes without suffering bottlenecks.
– The expectation is that NewSQL systems are about 50 times faster than
traditional OLTP RDBMS.
Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 6 BIG 318062
7. BIG DATA BIG
THE NOSQL WORLD
Big Data Public Private Forum
Schema-less
Unstructured
Apache HBase
Row – Column - Timestamp
Value = String
Several columns
Voldemort
Documents
Stored in JSON or XML
Accessible by Key or content
CouchDB, MongoDB
Graphs structures
Highly associative, social networks
Accessible by Key or content
Neo4j
Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 7 BIG 318062
8. BIG DATA BIG
APACHE BIG DATA TOOLS
Big Data Public Private Forum
Courtesy of Michael Hausenblass
Kick Off meeting 10-11/09/2012 8 BIG 318062
9. BIG DATA IS ABOUT CHOOSING THE BIG
Big Data Public Private Forum
RIGHT THING
Distributed massive storage
Map-Reduce (Hadoop) Hadoop File System (HDFS)
Analytical platform NoSQL (Hbase, Cassandra, CouchDB…)
Intelligent parallelization
Reliability Long-lasting analytical algorithms
Iterative process / might take days
Un-
Huge volume
structured
Data curation
Batch / Historical
Batch processing
Batch processing
Social
3 V’s
Query
Query
Acquisition
Acquisition
Linked facade
facade
Acquisition software Performant Queries
Apache Kafka Cloudera Impala
Corporate Apache HIVE
Messaging Real-time
Publish-subscribe Real-time processing
Real-time processing Apache Solr (Lucene)
Hundred of thousands per second RDBMS (SQL)
Events Processing platforms …
Apache Flume Storm/Apache S4 Running pipelines
For events or logs Stream processing Fast algorithms
Event pushing Intelligent parallelization High throughput
Logs Robust and flexible topologies No storage or complex storage
…
Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 9 BIG 318062
10. TRENDS BIG
Big Data Public Private Forum
Big Data Tendencies
New efficient and scalable algorithms
Multidisciplinary teams (data scientists)
Big Data Analytics Understand technology platforms Aggregation and
correlation algorithms
Big Data vs small eyes
New Visualization and queries techniques Take “time” into account
Faster queries
Stream processing
Going real-time Real-time queries
Real-time visualization paradigms
Performance and scalability
Data Management Storage selection and costs
Cloud vs. data centers
Data selection
Data curation Data value and garbage
Trust, provenance
New business models for selling data
New business models Dealing with privacy, ownership
Fostering reuse of data
“Do it before the competitors do”
Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 10 BIG 318062
11. BUT BE AWARE OF THE RISKS BIG
Big Data Public Private Forum
Too many solutions: Get hold of data
Policies:
Blank page blockage Break Data silos
Security,
Privacy, IPR Data Quality
Investment Curation
Trust Few Professionals
Old apps, Storage Data scientists
CPDs vs Cloud Provenance
Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 11 BIG 318062
12. BIG
BIG DATA AND THE PUBLIC SECTOR Big Data Public Private Forum
FINDINGS FROM THE TECHAMERICA SURVEY
Big Data is Here to Stay: 82% Say Real-Time Big Data is the Way of the
1 Future
2 Real-Time Big Data Could Save Government 10% or More Annually
3 Real-Time Big Data Could Save Significant Number of Lives
4 Big Data is Helping Improve the Quality of Citizens’ Lives
5 State IT Officials Agree Big Data Can Improve Social and Welfare Services
6 Big Data Advances in Medicine, Public Safety Seen as Most Important
7 Privacy and Policy Concerns Remain a Barrier to Utilizing Big Data
Public Sector IT Officials Frustrated With Multiple Data Formats,
8 Leadership Changes
9 Many Public Sector IT Officials Say Database queries Take Too Much Time
Nearly All Government IT Officials Would Opt For Real Time Access to Data
10 Over Backward Looking Queries
Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 12 BIG 318062
13. Big Data in the Public Sector BIG
Big Data Public Private Forum
Primer taller para la construcción de la hoja de ruta de Big Data para Europa 16/04/2013 13 BIG 318062
14. BIG
PROJECT BIG - SECTOR FORUMS AND TECHNICAL Big Data Public Private Forum
WORKING GROUPS
Kick Off meeting 10-11/09/2012 14 BIG 318062
15. PROJECT BIG BIG
SECTORS’ ROADMAP
Big Data Public Private Forum
Identification ▶requirements and objectives from all
of Sector’s Sectors (industry driven working
requisites groups)
Applicability of ▶Introduce technologies and trends to
Big Data technical the stakeholders to better understand
white papers in Big Data technologies and its
each Sector
capabilities
▶Sectorial roadmap (elaborate a
Elaboration of roadmap per sector).
Sector ▶Contributions towards integrated
Roadmap roadmap (cross-sectorial)
Kick Off meeting 10-11/09/2012 15 BIG 318062
16. PROJECT BIG - TIMELINE OF THE MOST BIG
IMPORTANT DELIVERABLES
Big Data Public Private Forum
04/2013 04/2013
D2.2.1-1º version of Technical D2.2.1 D2.3.1 D2.3.1-1º version of Sector’s requisites
white papers
06/2013
D4.2.1 D4.2.1-1ºversion of IPR, Standardization
•recommendations
10/2013
D4.3.1-First draft of the
Big Data Public-Private Forum D4.3.1 D2.4.1 09/2013
D2.4.1 1ª version of
01/2014 Sector´s Roadmap
D2.2.2-Final version of D2.2.2
Technical white paper
04/2014
D2.3.2-Final version of
04/2014 D4.2.2 D2.3.2
Sectors requisites
D4.2.2-Final version of IPR,
Standardization recommendations
10/2014
D2.5 D2.5-Cross-sectorial roadmap
consolidation
Kick Off meeting 10-11/09/2012 16 BIG 318062
17. Big Data Public Private Forum
THANKS
Tomás Pariente Lobo
Atos Research & Innovation
Atos Spain
tomas.parientelob@atosresearch.eu
Editor's Notes
STE is an emerging technology where we are involved as one of the main IT companies in Europe.