SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
BÂLE BERNE BRUGG DUSSELDORF FRANCFORT S.M. FRIBOURG E.BR. GENÈVE
HAMBOURG COPENHAGUE LAUSANNE MUNICH STUTTGART VIENNE ZURICH
Big Data and Fast Data combined – is it possible?
Ulises Fasoli
Swiss Data Forum
24.11.2015 - Lausanne
Ulises Fasoli
• Senior Consultant @ Trivadis – Lausanne
• 8+ years of software development experience
• Occasional blogger
• Contact information :
• Email : ulises.fasoli@trivadis.com
• Blog: http://ufasoli.blogspot.com
• Twitter: ufasoli
Our company
Trivadis is a market leader in IT consulting, system integration, solution engineering and
the provision of IT services focusing on and technologies in Switzerland, Germany, Austria
and Denmark. We offer our services in the following strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
Agenda
1. Big Data and Fast Data, what is it?
2. Architecting (Big) Data Systems
3. The Lambda Architecture
4. Use Case and the Implementation
5. Summary and Outlook
Big Data and Fast Data, what is it?
Big Data Definition (4 Vs) Characteristics of Big Data: Its Volume, Velocity and
Variety in combination
Time to action ? -> Big Data + Event Processing = Fast Data
The world is changing …
The model of Generating/Consuming Data has changed ….
Old Model: few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming data
60
SECONDS
Data volume per type
Internet Of Things – Sensors
are/will be everywhere
• There are more devices tapping into
the internet than people on earth
• How do we prepare our
systems/architecture for the future?
The world is changing …
new data stores
Problem of traditional (R)DBMS approach:
Complex object graph
Schema evolution
Semi-structured data
Scaling
Polyglot persistence
Using multiple data storage technologies (RDMBS + NoSQL + NewSQL + In- Memory)
ORDER
ADDRESS
CUSTOMER
ORDER_LINES
Order
ID: 1001
Order Date: 15.9.2012
Line Items
Customer
First Name: Peter
Last Name: Sample
Billing Address
Street: Somestreet 10
City: Somewhere
Postal Code: 55901
Name
Ipod Touch
Monster Beat
Apple Mouse
Quantity
1
2
1
Price
220.95
190.00
69.90
The world is changing…New platforms evolving (i.e. Hadoop Ecosystem)
Data as an Asset – Store everything?
Data is
just too valuable
to delete!
We must
store everything!
Nonsense! Just
store the data
you know
you need today!
It depends …
Big Data technologies allow to
store the raw information from
new and existing data sources so
that you can later use it to create
new data-driven products, which
you haven’t thought about today!
Agenda
1. Big Data and Fast Data, what is it?
2. Architecting (Big) Data Systems
3. The Lambda Architecture
4. Use Case and the Implementation
5. Summary and Outlook
Architecting (Big) Data Systems
What is a data system?
• A (data) system that manages the storage and querying of
data with a lifetime measured in years encompassing every
version of the application to ever exist, every hardware
failure and every human mistake ever made.
• A data system answers questions based on information that
was acquired in the past
What is a data system? - Goal
• The goal of a data system is to compute arbitrary
functions on arbitrary data.
• Questions are answered by running functions that take
data as input
query = function (all data)
Desired properties of a data system
General Extensible
Allow
Ad-hoc queries
Robust and fault
tolerant
Low latency
read / updates
Scalable
Minimal
maintenance
Debuggable
How do we build (data) systems today – Today’s Architectures
• CRUD pattern
What is the problem with this?
• Lack of Human Fault
Tolerance
• Potential loss of
information/data
Mutable
Database
Application
(Query)
RDBMS
NoSQL
NewSQL
Mobile
Web
RIA
Rich Client
Source of Truth
Source of Truth
Source of Truth is mutable!
Lack of Human Fault Tolerance
Bugs will be deployed to production over the lifetime of a data system
Operational mistakes will be made
Humans are part of the overall system
• Just like hard disks, CPUs, memory, software
• design for human error like you design for any other fault
Examples of human error
• Deploy a bug that increments counters by two instead of by one
• Accidentally delete data from database
• Accidental DOS on important internal service
Worst two consequences: data loss or data corruption
As long as an error doesn‘t lose or corrupt good data, you can fix what went wrong
Lack of Human Fault Tolerance – Immutability vs. Mutability
The U and D in CRUD
A mutable system updates the current state of the world
Mutable systems inherently lack human fault-tolerance
Easy to corrupt or lose data
Lack of Human Fault Tolerance – Immutability vs. Mutability
An immutable system captures historical records of events
Events happens at a particular time and are always true
Immutability restricts the range of errors causing data loss/data corruption
Vastly more human fault-tolerant
Conclusion: Your source of truth should always be immutable
A different kind of architecture with immutable source of truth
Instead of using our traditional approach … why not build data systems like this
HDFS
NoSQL
NewSQL
RDBMS
View on
Data
Mobile
Web
RIA
Rich Client
Immutable
data
View on
Data
Application
(Query)
Source of Truth
How to create the views on the Immutable data?
On the fly ?
Materialized, i.e. Pre-computed ?
Immutable
data
View
Immutable
data
Pre-
Computed
Views
Query
Query
(Big) Data Processing
Immutable
data
Pre-
Computed
Views
Query??
Incoming
Data
How to compute the materialized views ?
How to compute queries from the views ?
Today Big Data Processing means Batch Processing …
HDFS
Data Store optimized for
appending large results
Queries
Stream 1
Stream 2
Event
Hadoop cluster
(Map/Reduce)
Hadoop Distributed File System
Big Data Processing - Batch
01.02.13 Add iPAD 64GB
10.03.13 Add Sony RX-100
11. 03.13 Add Canon GX-10
11.03.13 Remove Sony RX-100
12.03.13 Add Nikon S-100
14.04.13 Add BoseQC-15
15.04.13 Add MacBook Pro 15
20.04.13 Remove Canon GX10
iPAD 64GB
Nikon S-100
BoseQC-15
MacBook Pro 15
4compute derive
Favorite Product List Changes
Current Favorite
Product List
Current
Product
Count
Raw information => data
Information => derived
Big Data Processing –
Batch
But we are not done yet …
Fully processed data Last full
batch period
Time for
batch job
time
nownon-processed data
time
now
batch-processed data
Source of truth
results
Big Data Processing - Adding Real-Time
Immutable
data
Batch
Views
Query
?
Data
Stream
Realtime
Views
Incoming
Data
How to compute queries
from the views ?
How to compute real-time views
Big Data Processing - Adding Real-Time
1.2.13 Add iPAD 64GB
10.3.13 Add Sony RX-100
11..3.13 Add Canon GX-10
11.3.13 Remove Sony RX-100
12.3.13 Add Nikon S-100
14.4.13 Add BoseQC-15
15.4.13 Add MacBook Pro 15
Now Add Canon Scanner
iPAD 64GB
Nikon S-100
BoseQC-15
MacBook Pro 15
Canon GX10
6
compute
Favorite Product List Changes
Current Favorite
Product List
Current
Product
Count
Now Canon ScannercomputeAdd Canon Scanner
Stream of
Favorite Product List Changes
Immutable data
Views
Data Stream
Query
incoming
Big Data Processing -
Batch & Real Time
time
Fully processed data Last full
batch period
now
Time for
batch job
batch processing
worked fine here
(e.g. Hadoop)
real time processing
works here
blended view for end user
Adapted from Ted Dunning (March 2012):
http://www.youtube.com/watch?v=7PcmbI5aC20
Agenda
1. Big Data and Fast Data, what is it?
2. Architecting (Big) Data Systems
3. The Lambda Architecture
4. Use Case and the Implementation
5. Summary and Outlook
The Lambda Architecture
Lambda Architecture
Immutable
data
Batch
View
Query
Data
Stream
Realtime
View
Incoming
Data
Serving Layer
Speed Layer
Batch Layer
A
B C D
E F
Lambda => Query = function(all data)
Lambda Architecture
A. All data is sent to both the batch and speed layer
B. Master data set is an immutable, append-only set of data
C. Batch layer pre-computes query functions from scratch, result is called Batch Views.
Batch layer constantly re-computes the batch views.
D. Batch views are indexed and stored in a scalable database to get particular values
very quickly. Swaps in new batch views when they are available
E. Speed layer compensates for the high latency of updates to the Batch Views
F. Uses fast incremental algorithms and read/write databases to produce real-time views
G. Queries are resolved by getting results from both batch and real-time views
Lambda Architecture
Serving Layer
Batch Layer
Speed Layer
Stores the immutable constantly growing dataset
Computes arbitrary views from this dataset using BigData technologies
(can take hours)
Can be always recreated
Computes the views from the constant stream of data it receives
Needed to compensate for the high latency of the batch layer
Incremental model and views are transient
Responsible for indexing and exposing the pre-computed batch
views so that they can be queried
Exposes the incremented real-time views
Merges the batch and the real-time views into a consistent result
Lambda Architecture
Adapted from: Marz, N. & Warren, J. (2013) Big Data. Manning.
Distribution
Layer
Speed Layer
Precompute
Views
Visualization
Batch Layer
Precomputed
information
All data
Incremented
information
Process stream
Batch
recompute
Realtime
increment
Serving Layer
batch view
batch view
real time view
real time view
DataService(Merge)
Sensor
Layer
Incoming
Data
social
mobile
IoT
…
Agenda
1. Big Data and Fast Data, what is it?
2. Architecting (Big) Data Systems
3. The Lambda Architecture
4. Use Case and the Implementation
5. Summary and Outlook
Use Case and the Implementation
Project Definition
Build a platform for analysing Twitter communications in retrospective and
in real-time
Scalability and ability for future data fusion with other information is a
must
Provide a Web-based access to the analytical information
Invest into new, innovative and not widely-proven technology
– PoC environment, a pre-invest for future systems
Anatomy of a tweet "profile_text_color":"666666",
"profile_use_background_image":true,
"default_profile":false,
"default_profile_image":false,
},
"geo":{
"type":"Point","coordinates":[43.28261499,-2.96464655]},
"coordinates":{"type":"Point","coordinates":[-2.96464655,43.28261499]},
"place":{"id":"cd43ea85d651af92",
"url":"https://api.twitter.com/1.1/geo/id/cd43ea85d651af92.json",
"place_type":"city",
"name":"Bilbao",
"full_name":"Bilbao, Vizcaya",
"country_code":"ES",
"country":"Espau00f1a",
"bounding_box":{"type":"Polygon","coordinates":[[[-2.9860102,43.2136542],
[-2.9860102,43.2901452],[-2.8803248,43.2901452],[-2.8803248,43.2136542]]]},
"attributes":{}},
"contributors": null,
"retweet_count":0,
"favorite_count":0,
"entities":{"hashtags":[{"text":"quelosepash","indices":[58,70]}],
"symbols":[],
"urls":[],
"user_mentions":[]},
"favorited":false,
"retweeted":false,
"filter_level":"medium",
"lang":"es“}
"created_at":"Sun Aug 18 14:29:11 +0000 2013",
"id":369103686938546176,
"id_str":"369103686938546176",
"text":"Baloncesto preparaciu00f3n Eslovenia, Rajoy derrota a Merkel.
#quelosepash",
"source":"u003ca href="http://twitter.com/download/iphone"
rel="nofollow”
u003eTwitter for iPhoneu003c/au003e",
"truncated":false,
"in_reply_to_status_id":null,
"in_reply_to_status_id_str":null,
"in_reply_to_user_id":null,
"in_reply_to_user_id_str":null,
"in_reply_to_screen_name":null,
"user":{
"id":15032594,
"id_str":"15032594",
"name":"Juan Carlos Romou2122",
"screen_name":"jcsromo",
"location":"Sopuerta, Vizcaya",
"url":null,
"description":"Portugalujo, saturado de todo, de baloncesto no. Twitter
personal.",
"followers_count":1331,
"friends_count":1326,
"listed_count":31,
"geo_enabled":true,
"verified":false,
"statuses_count":22787,
"lang":"es",
"contributors_enabled":false,
"is_translator":false,
…
Time
Space
Content
Social
Technical
Views on Tweets in four dimensions
when ⇐ where+what+who
• Time series
• Timelines
where ⇐ when+what+who
• Geo maps
• Density plots
what ⇐ when+where+who
• Word clouds
• Topic trends
who ⇐ when+where+what
• Social network graphs
• Activity graphs
Time
Space
Social
Content
Time
Space
Social
Content
Time
Space
Social
Content
Time
SpaceSocial
Content
Accessing Twitter
source Limit Price
Twitter’s Search API 3200 / user
5000 / keyword
180 queries/ 15 minute
free
Twitter’s Streaming API 1%-10% of tweets volume free
DataSift
none
0.15 -0.20$ /
unit
Gnip (acquired by twitter) none By quote
Batch Layer
Lambda Architecture
Distribution
Layer
Speed Layer
Visualization
Precomputed
information
All data
Incremented
information
Process stream
Batch
recompute
Realtime
increment
Serving Layer
batch view
batch view
real time view
real time view
DataService(Merge)
Sensor
Layer
Incoming
Data
social
mobile
IoT
…
Lambda Architecture in Action (1)
Twitter Horsebird Client (hbc)
• Twitter Java API over Streaming API
Spring Framework
• Popular Java Framework used to modularize part of the logic (sensor and serving layer)
Apache Kafka
• Simple messaging framework based on file system to distribute information to both batch
and speed layer
Apache Avro
• Serialization system for efficient cross-language RPC and persistent data storage
JSON
• open standard format that uses human-readable text to transmit data objects consisting of
attribute–value pairs.
Lambda Architecture in Action (2)
Cloudera Distribution
• Distribution of Apache Hadoop: HDFS, MapReduce, Hive, Flume, Pig, Impala
Cloudera Impala
• distributed query execution engine that runs against data stored in HDFS and HBase
Apache Zookeeper
• Distributed, highly available coordination service. Provides primitives such as distributed
locks
Apache Storm
• distributed, fault-tolerant real-time computation system
Apache Cassandra
• distributed database management system designed to handle large amounts of data
across many commodity servers, providing high availability with no single point of failure
Lambda Demo
Facts & Figures
Currently in total
• 8.8 TB data stored
• 3.9 TB Raw Data
• 800 GB Table Data
• 1.9 TB Index data
25 active twitter feeds
• 8 million tweets per day
• Daily growth rate: 9.2 GB
• 5.5 GB raw data
• 1.2 GB Impala table data
• 2.5 GB Solr index data
Cluster of 22 nodes
• ~100 processors
• ~112.2 TB HD capacity in total; 35% used
• >1 TB RAM
Agenda
1. Big Data and Fast Data, what is it?
2. Architecting (Big) Data Systems
3. The Lambda Architecture
4. Use Case and the Implementation
5. Summary and Outlook
Summary and Outlook
Summary – The lambda architecture
 Can discard batch views and real-time views and recreate
everything from scratch
 Mistakes corrected via re-computation
 Scalability through platform and distribution
 Data storage layer optimized independently from query resolution layer
 Still in a early stage …. But a very interesting idea!
- Today a zoo of technologies are needed => Infrastructure group might not like it
- Better with so-called Hadoop distributions and Hadoop V2 (YARN)
- Logic has to be implemented twice (speed and batch layer)
=> Kappa architecture?
“Kappa Architecture”
Adapted from: Marz, N. & Warren, J. (2013) Big Data. Manning.
Distribution
Layer
Speed Layer
Visualization
Batch Layer
All data
Incremented
information
Process stream
Realtime
increment
Serving Layer
real time view
real time view
DataService
Sensor
Layer
Incoming
Data
social
mobile
IoT
…
Precomputed
analytics
analytic view
DataService
Batch
Analytical analysis
Replay
http://www.digitallifeplus.com/18913/what-happens-online-in-60-seconds-
infographic/
http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
http://manning.com/marz/
Manning : Big Data
Q & A
Ulises Fasoli
Senior Consultant – Application Development
Tél. +41 21 321 47 00
ulises.fasoli@trivadis.com

Contenu connexe

Tendances

Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computingViet-Trung TRAN
 
Relationship between cloud computing and big data
Relationship between cloud computing and big dataRelationship between cloud computing and big data
Relationship between cloud computing and big dataJazan University
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Datawaheed751
 
Cloud Computing, SDN, Big Data and Internet of Everything - Lew Tucker
Cloud Computing, SDN, Big Data and Internet of Everything - Lew TuckerCloud Computing, SDN, Big Data and Internet of Everything - Lew Tucker
Cloud Computing, SDN, Big Data and Internet of Everything - Lew TuckerLew Tucker
 
Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing Seungyun Lee
 
Big Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesBig Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesCRISIL Limited
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesShilpi Sharma
 
International Journal of Computer Science, Engineering and Information Techn...
International Journal of Computer Science, Engineering and  Information Techn...International Journal of Computer Science, Engineering and  Information Techn...
International Journal of Computer Science, Engineering and Information Techn...ijcseit
 
Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the CloudNati Shalom
 
IoT and Big Data
IoT and Big DataIoT and Big Data
IoT and Big Datasabnees
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataSitaram Kotnis
 
Welcome to Your Compact, Data-Driven, Generator-Free Data Center Future
Welcome to Your Compact, Data-Driven, Generator-Free Data Center FutureWelcome to Your Compact, Data-Driven, Generator-Free Data Center Future
Welcome to Your Compact, Data-Driven, Generator-Free Data Center FutureAbaram Network Solutions
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big DataMatthew Dennis
 
Cloud computing and big data analytics
Cloud computing and big data analyticsCloud computing and big data analytics
Cloud computing and big data analyticshanish93
 

Tendances (20)

Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
 
Study: #Big Data in #Austria
Study: #Big Data in #AustriaStudy: #Big Data in #Austria
Study: #Big Data in #Austria
 
Relationship between cloud computing and big data
Relationship between cloud computing and big dataRelationship between cloud computing and big data
Relationship between cloud computing and big data
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
Cloud Computing, SDN, Big Data and Internet of Everything - Lew Tucker
Cloud Computing, SDN, Big Data and Internet of Everything - Lew TuckerCloud Computing, SDN, Big Data and Internet of Everything - Lew Tucker
Cloud Computing, SDN, Big Data and Internet of Everything - Lew Tucker
 
Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing
 
Big Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesBig Data’s Big Impact on Businesses
Big Data’s Big Impact on Businesses
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & Challenges
 
International Journal of Computer Science, Engineering and Information Techn...
International Journal of Computer Science, Engineering and  Information Techn...International Journal of Computer Science, Engineering and  Information Techn...
International Journal of Computer Science, Engineering and Information Techn...
 
Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the Cloud
 
IoT and Big Data
IoT and Big DataIoT and Big Data
IoT and Big Data
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Executive Summit for ISV & Application builders - January 2015
Executive Summit for ISV & Application builders - January 2015Executive Summit for ISV & Application builders - January 2015
Executive Summit for ISV & Application builders - January 2015
 
Optimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec AzureOptimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec Azure
 
Welcome to Your Compact, Data-Driven, Generator-Free Data Center Future
Welcome to Your Compact, Data-Driven, Generator-Free Data Center FutureWelcome to Your Compact, Data-Driven, Generator-Free Data Center Future
Welcome to Your Compact, Data-Driven, Generator-Free Data Center Future
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 
Cloud computing and big data analytics
Cloud computing and big data analyticsCloud computing and big data analytics
Cloud computing and big data analytics
 

En vedette

Augmentez votre efficacité dans votre planification budgétaire
Augmentez votre efficacité dans votre planification budgétaireAugmentez votre efficacité dans votre planification budgétaire
Augmentez votre efficacité dans votre planification budgétaireSwiss Data Forum Swiss Data Forum
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantSwiss Data Forum Swiss Data Forum
 
Bigdata et datamining au service de la transition énergétique
Bigdata et datamining au service de la transition énergétiqueBigdata et datamining au service de la transition énergétique
Bigdata et datamining au service de la transition énergétiqueSwiss Data Forum Swiss Data Forum
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...DataWorks Summit/Hadoop Summit
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
 
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...Spark Summit
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Spark Summit
 

En vedette (14)

Digitalisation de la donnée Client
Digitalisation de la donnée ClientDigitalisation de la donnée Client
Digitalisation de la donnée Client
 
Building High-scalable Enterprise Solutions,
Building High-scalable Enterprise Solutions, Building High-scalable Enterprise Solutions,
Building High-scalable Enterprise Solutions,
 
Augmentez votre efficacité dans votre planification budgétaire
Augmentez votre efficacité dans votre planification budgétaireAugmentez votre efficacité dans votre planification budgétaire
Augmentez votre efficacité dans votre planification budgétaire
 
Le monde NOSQL pour les spécialistes du relationnel,
Le monde NOSQL pour les spécialistes du relationnel, Le monde NOSQL pour les spécialistes du relationnel,
Le monde NOSQL pour les spécialistes du relationnel,
 
Gouvernance de données
Gouvernance de donnéesGouvernance de données
Gouvernance de données
 
Intelligence & Gouvernance
Intelligence & GouvernanceIntelligence & Gouvernance
Intelligence & Gouvernance
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
 
Bigdata et datamining au service de la transition énergétique
Bigdata et datamining au service de la transition énergétiqueBigdata et datamining au service de la transition énergétique
Bigdata et datamining au service de la transition énergétique
 
Internet of Things and Big Data
Internet of Things and Big DataInternet of Things and Big Data
Internet of Things and Big Data
 
Cloud transition - The Trivadis approach
Cloud transition - The Trivadis approachCloud transition - The Trivadis approach
Cloud transition - The Trivadis approach
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
 

Similaire à Big Data and Fast Data combined – is it possible?

Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerMicrosoft
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxshujee381
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big DataMrinal Kumar
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)Xavier Constant
 
Event-Processing-und-BigData-kombiniert-guido_schmutz
Event-Processing-und-BigData-kombiniert-guido_schmutzEvent-Processing-und-BigData-kombiniert-guido_schmutz
Event-Processing-und-BigData-kombiniert-guido_schmutzTrivadis
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...exponential-inc
 
PyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive applicationPyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive applicationHua Chu
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfkalai75
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data CentersGina Buck
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxPankajkumar496281
 
IoT Architecture - are traditional architectures good enough?
IoT Architecture - are traditional architectures good enough?IoT Architecture - are traditional architectures good enough?
IoT Architecture - are traditional architectures good enough?Guido Schmutz
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Denodo
 
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediFundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediAnimesh Chaturvedi
 
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)Denodo
 

Similaire à Big Data and Fast Data combined – is it possible? (20)

Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptx
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)
 
Event-Processing-und-BigData-kombiniert-guido_schmutz
Event-Processing-und-BigData-kombiniert-guido_schmutzEvent-Processing-und-BigData-kombiniert-guido_schmutz
Event-Processing-und-BigData-kombiniert-guido_schmutz
 
Big data
Big dataBig data
Big data
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
 
PyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive applicationPyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive application
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data Centers
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
IoT Architecture - are traditional architectures good enough?
IoT Architecture - are traditional architectures good enough?IoT Architecture - are traditional architectures good enough?
IoT Architecture - are traditional architectures good enough?
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)
 
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediFundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
 
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)
 

Plus de Swiss Data Forum Swiss Data Forum

Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Swiss Data Forum Swiss Data Forum
 
Montée en version de 300 bases de données vers Oracle 12c en 300 jours. Quel...
Montée en version de 300 bases de données vers Oracle 12c en 300 jours.  Quel...Montée en version de 300 bases de données vers Oracle 12c en 300 jours.  Quel...
Montée en version de 300 bases de données vers Oracle 12c en 300 jours. Quel...Swiss Data Forum Swiss Data Forum
 
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...Swiss Data Forum Swiss Data Forum
 
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom BusinessLe Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom BusinessSwiss Data Forum Swiss Data Forum
 
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...Swiss Data Forum Swiss Data Forum
 

Plus de Swiss Data Forum Swiss Data Forum (15)

Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
 
Customer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° viewCustomer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° view
 
Montée en version de 300 bases de données vers Oracle 12c en 300 jours. Quel...
Montée en version de 300 bases de données vers Oracle 12c en 300 jours.  Quel...Montée en version de 300 bases de données vers Oracle 12c en 300 jours.  Quel...
Montée en version de 300 bases de données vers Oracle 12c en 300 jours. Quel...
 
IoT Portal with PowerBI and SharePoint
IoT Portal with PowerBI and SharePointIoT Portal with PowerBI and SharePoint
IoT Portal with PowerBI and SharePoint
 
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...
 
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom BusinessLe Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business
 
IoT – The reality of real world solutions
IoT – The reality of real world solutions IoT – The reality of real world solutions
IoT – The reality of real world solutions
 
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,
 
IT-Analytics: Screen your IT processes with BI Technology
IT-Analytics: Screen your IT processes with BI TechnologyIT-Analytics: Screen your IT processes with BI Technology
IT-Analytics: Screen your IT processes with BI Technology
 
PoC Oracle Exadata - Retour d'expérience
PoC Oracle Exadata - Retour d'expériencePoC Oracle Exadata - Retour d'expérience
PoC Oracle Exadata - Retour d'expérience
 
A gentle introduction to Oracle R Enterprise
A gentle introduction to Oracle R EnterpriseA gentle introduction to Oracle R Enterprise
A gentle introduction to Oracle R Enterprise
 
Mobilité dans l'entreprise - Facts & Figures
Mobilité dans l'entreprise - Facts & FiguresMobilité dans l'entreprise - Facts & Figures
Mobilité dans l'entreprise - Facts & Figures
 
Information Life Cycle Management avec Oracle 12c
Information Life Cycle Management avec Oracle 12cInformation Life Cycle Management avec Oracle 12c
Information Life Cycle Management avec Oracle 12c
 
Data vault modeling et retour d'expérience
Data vault modeling et retour d'expérienceData vault modeling et retour d'expérience
Data vault modeling et retour d'expérience
 

Dernier

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Dernier (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Big Data and Fast Data combined – is it possible?

  • 1. BÂLE BERNE BRUGG DUSSELDORF FRANCFORT S.M. FRIBOURG E.BR. GENÈVE HAMBOURG COPENHAGUE LAUSANNE MUNICH STUTTGART VIENNE ZURICH Big Data and Fast Data combined – is it possible? Ulises Fasoli Swiss Data Forum 24.11.2015 - Lausanne
  • 2. Ulises Fasoli • Senior Consultant @ Trivadis – Lausanne • 8+ years of software development experience • Occasional blogger • Contact information : • Email : ulises.fasoli@trivadis.com • Blog: http://ufasoli.blogspot.com • Twitter: ufasoli
  • 3. Our company Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. O P E R A T I O N
  • 4. Agenda 1. Big Data and Fast Data, what is it? 2. Architecting (Big) Data Systems 3. The Lambda Architecture 4. Use Case and the Implementation 5. Summary and Outlook
  • 5. Big Data and Fast Data, what is it?
  • 6. Big Data Definition (4 Vs) Characteristics of Big Data: Its Volume, Velocity and Variety in combination Time to action ? -> Big Data + Event Processing = Fast Data
  • 7. The world is changing … The model of Generating/Consuming Data has changed …. Old Model: few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data
  • 10. Internet Of Things – Sensors are/will be everywhere • There are more devices tapping into the internet than people on earth • How do we prepare our systems/architecture for the future?
  • 11. The world is changing … new data stores Problem of traditional (R)DBMS approach: Complex object graph Schema evolution Semi-structured data Scaling Polyglot persistence Using multiple data storage technologies (RDMBS + NoSQL + NewSQL + In- Memory) ORDER ADDRESS CUSTOMER ORDER_LINES Order ID: 1001 Order Date: 15.9.2012 Line Items Customer First Name: Peter Last Name: Sample Billing Address Street: Somestreet 10 City: Somewhere Postal Code: 55901 Name Ipod Touch Monster Beat Apple Mouse Quantity 1 2 1 Price 220.95 190.00 69.90
  • 12. The world is changing…New platforms evolving (i.e. Hadoop Ecosystem)
  • 13. Data as an Asset – Store everything? Data is just too valuable to delete! We must store everything! Nonsense! Just store the data you know you need today! It depends … Big Data technologies allow to store the raw information from new and existing data sources so that you can later use it to create new data-driven products, which you haven’t thought about today!
  • 14. Agenda 1. Big Data and Fast Data, what is it? 2. Architecting (Big) Data Systems 3. The Lambda Architecture 4. Use Case and the Implementation 5. Summary and Outlook
  • 16. What is a data system? • A (data) system that manages the storage and querying of data with a lifetime measured in years encompassing every version of the application to ever exist, every hardware failure and every human mistake ever made. • A data system answers questions based on information that was acquired in the past
  • 17. What is a data system? - Goal • The goal of a data system is to compute arbitrary functions on arbitrary data. • Questions are answered by running functions that take data as input query = function (all data)
  • 18. Desired properties of a data system General Extensible Allow Ad-hoc queries Robust and fault tolerant Low latency read / updates Scalable Minimal maintenance Debuggable
  • 19. How do we build (data) systems today – Today’s Architectures • CRUD pattern What is the problem with this? • Lack of Human Fault Tolerance • Potential loss of information/data Mutable Database Application (Query) RDBMS NoSQL NewSQL Mobile Web RIA Rich Client Source of Truth Source of Truth Source of Truth is mutable!
  • 20. Lack of Human Fault Tolerance Bugs will be deployed to production over the lifetime of a data system Operational mistakes will be made Humans are part of the overall system • Just like hard disks, CPUs, memory, software • design for human error like you design for any other fault Examples of human error • Deploy a bug that increments counters by two instead of by one • Accidentally delete data from database • Accidental DOS on important internal service Worst two consequences: data loss or data corruption As long as an error doesn‘t lose or corrupt good data, you can fix what went wrong
  • 21. Lack of Human Fault Tolerance – Immutability vs. Mutability The U and D in CRUD A mutable system updates the current state of the world Mutable systems inherently lack human fault-tolerance Easy to corrupt or lose data
  • 22. Lack of Human Fault Tolerance – Immutability vs. Mutability An immutable system captures historical records of events Events happens at a particular time and are always true Immutability restricts the range of errors causing data loss/data corruption Vastly more human fault-tolerant Conclusion: Your source of truth should always be immutable
  • 23. A different kind of architecture with immutable source of truth Instead of using our traditional approach … why not build data systems like this HDFS NoSQL NewSQL RDBMS View on Data Mobile Web RIA Rich Client Immutable data View on Data Application (Query) Source of Truth
  • 24. How to create the views on the Immutable data? On the fly ? Materialized, i.e. Pre-computed ? Immutable data View Immutable data Pre- Computed Views Query Query
  • 25. (Big) Data Processing Immutable data Pre- Computed Views Query?? Incoming Data How to compute the materialized views ? How to compute queries from the views ?
  • 26. Today Big Data Processing means Batch Processing … HDFS Data Store optimized for appending large results Queries Stream 1 Stream 2 Event Hadoop cluster (Map/Reduce) Hadoop Distributed File System
  • 27. Big Data Processing - Batch 01.02.13 Add iPAD 64GB 10.03.13 Add Sony RX-100 11. 03.13 Add Canon GX-10 11.03.13 Remove Sony RX-100 12.03.13 Add Nikon S-100 14.04.13 Add BoseQC-15 15.04.13 Add MacBook Pro 15 20.04.13 Remove Canon GX10 iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15 4compute derive Favorite Product List Changes Current Favorite Product List Current Product Count Raw information => data Information => derived
  • 28. Big Data Processing – Batch But we are not done yet … Fully processed data Last full batch period Time for batch job time nownon-processed data time now batch-processed data Source of truth results
  • 29. Big Data Processing - Adding Real-Time Immutable data Batch Views Query ? Data Stream Realtime Views Incoming Data How to compute queries from the views ? How to compute real-time views
  • 30. Big Data Processing - Adding Real-Time 1.2.13 Add iPAD 64GB 10.3.13 Add Sony RX-100 11..3.13 Add Canon GX-10 11.3.13 Remove Sony RX-100 12.3.13 Add Nikon S-100 14.4.13 Add BoseQC-15 15.4.13 Add MacBook Pro 15 Now Add Canon Scanner iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15 Canon GX10 6 compute Favorite Product List Changes Current Favorite Product List Current Product Count Now Canon ScannercomputeAdd Canon Scanner Stream of Favorite Product List Changes Immutable data Views Data Stream Query incoming
  • 31. Big Data Processing - Batch & Real Time time Fully processed data Last full batch period now Time for batch job batch processing worked fine here (e.g. Hadoop) real time processing works here blended view for end user Adapted from Ted Dunning (March 2012): http://www.youtube.com/watch?v=7PcmbI5aC20
  • 32. Agenda 1. Big Data and Fast Data, what is it? 2. Architecting (Big) Data Systems 3. The Lambda Architecture 4. Use Case and the Implementation 5. Summary and Outlook
  • 35. Lambda Architecture A. All data is sent to both the batch and speed layer B. Master data set is an immutable, append-only set of data C. Batch layer pre-computes query functions from scratch, result is called Batch Views. Batch layer constantly re-computes the batch views. D. Batch views are indexed and stored in a scalable database to get particular values very quickly. Swaps in new batch views when they are available E. Speed layer compensates for the high latency of updates to the Batch Views F. Uses fast incremental algorithms and read/write databases to produce real-time views G. Queries are resolved by getting results from both batch and real-time views
  • 36. Lambda Architecture Serving Layer Batch Layer Speed Layer Stores the immutable constantly growing dataset Computes arbitrary views from this dataset using BigData technologies (can take hours) Can be always recreated Computes the views from the constant stream of data it receives Needed to compensate for the high latency of the batch layer Incremental model and views are transient Responsible for indexing and exposing the pre-computed batch views so that they can be queried Exposes the incremented real-time views Merges the batch and the real-time views into a consistent result
  • 37. Lambda Architecture Adapted from: Marz, N. & Warren, J. (2013) Big Data. Manning. Distribution Layer Speed Layer Precompute Views Visualization Batch Layer Precomputed information All data Incremented information Process stream Batch recompute Realtime increment Serving Layer batch view batch view real time view real time view DataService(Merge) Sensor Layer Incoming Data social mobile IoT …
  • 38. Agenda 1. Big Data and Fast Data, what is it? 2. Architecting (Big) Data Systems 3. The Lambda Architecture 4. Use Case and the Implementation 5. Summary and Outlook
  • 39. Use Case and the Implementation
  • 40. Project Definition Build a platform for analysing Twitter communications in retrospective and in real-time Scalability and ability for future data fusion with other information is a must Provide a Web-based access to the analytical information Invest into new, innovative and not widely-proven technology – PoC environment, a pre-invest for future systems
  • 41. Anatomy of a tweet "profile_text_color":"666666", "profile_use_background_image":true, "default_profile":false, "default_profile_image":false, }, "geo":{ "type":"Point","coordinates":[43.28261499,-2.96464655]}, "coordinates":{"type":"Point","coordinates":[-2.96464655,43.28261499]}, "place":{"id":"cd43ea85d651af92", "url":"https://api.twitter.com/1.1/geo/id/cd43ea85d651af92.json", "place_type":"city", "name":"Bilbao", "full_name":"Bilbao, Vizcaya", "country_code":"ES", "country":"Espau00f1a", "bounding_box":{"type":"Polygon","coordinates":[[[-2.9860102,43.2136542], [-2.9860102,43.2901452],[-2.8803248,43.2901452],[-2.8803248,43.2136542]]]}, "attributes":{}}, "contributors": null, "retweet_count":0, "favorite_count":0, "entities":{"hashtags":[{"text":"quelosepash","indices":[58,70]}], "symbols":[], "urls":[], "user_mentions":[]}, "favorited":false, "retweeted":false, "filter_level":"medium", "lang":"es“} "created_at":"Sun Aug 18 14:29:11 +0000 2013", "id":369103686938546176, "id_str":"369103686938546176", "text":"Baloncesto preparaciu00f3n Eslovenia, Rajoy derrota a Merkel. #quelosepash", "source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow” u003eTwitter for iPhoneu003c/au003e", "truncated":false, "in_reply_to_status_id":null, "in_reply_to_status_id_str":null, "in_reply_to_user_id":null, "in_reply_to_user_id_str":null, "in_reply_to_screen_name":null, "user":{ "id":15032594, "id_str":"15032594", "name":"Juan Carlos Romou2122", "screen_name":"jcsromo", "location":"Sopuerta, Vizcaya", "url":null, "description":"Portugalujo, saturado de todo, de baloncesto no. Twitter personal.", "followers_count":1331, "friends_count":1326, "listed_count":31, "geo_enabled":true, "verified":false, "statuses_count":22787, "lang":"es", "contributors_enabled":false, "is_translator":false, … Time Space Content Social Technical
  • 42. Views on Tweets in four dimensions when ⇐ where+what+who • Time series • Timelines where ⇐ when+what+who • Geo maps • Density plots what ⇐ when+where+who • Word clouds • Topic trends who ⇐ when+where+what • Social network graphs • Activity graphs Time Space Social Content Time Space Social Content Time Space Social Content Time SpaceSocial Content
  • 43. Accessing Twitter source Limit Price Twitter’s Search API 3200 / user 5000 / keyword 180 queries/ 15 minute free Twitter’s Streaming API 1%-10% of tweets volume free DataSift none 0.15 -0.20$ / unit Gnip (acquired by twitter) none By quote
  • 44. Batch Layer Lambda Architecture Distribution Layer Speed Layer Visualization Precomputed information All data Incremented information Process stream Batch recompute Realtime increment Serving Layer batch view batch view real time view real time view DataService(Merge) Sensor Layer Incoming Data social mobile IoT …
  • 45. Lambda Architecture in Action (1) Twitter Horsebird Client (hbc) • Twitter Java API over Streaming API Spring Framework • Popular Java Framework used to modularize part of the logic (sensor and serving layer) Apache Kafka • Simple messaging framework based on file system to distribute information to both batch and speed layer Apache Avro • Serialization system for efficient cross-language RPC and persistent data storage JSON • open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs.
  • 46. Lambda Architecture in Action (2) Cloudera Distribution • Distribution of Apache Hadoop: HDFS, MapReduce, Hive, Flume, Pig, Impala Cloudera Impala • distributed query execution engine that runs against data stored in HDFS and HBase Apache Zookeeper • Distributed, highly available coordination service. Provides primitives such as distributed locks Apache Storm • distributed, fault-tolerant real-time computation system Apache Cassandra • distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure
  • 48. Facts & Figures Currently in total • 8.8 TB data stored • 3.9 TB Raw Data • 800 GB Table Data • 1.9 TB Index data 25 active twitter feeds • 8 million tweets per day • Daily growth rate: 9.2 GB • 5.5 GB raw data • 1.2 GB Impala table data • 2.5 GB Solr index data Cluster of 22 nodes • ~100 processors • ~112.2 TB HD capacity in total; 35% used • >1 TB RAM
  • 49. Agenda 1. Big Data and Fast Data, what is it? 2. Architecting (Big) Data Systems 3. The Lambda Architecture 4. Use Case and the Implementation 5. Summary and Outlook
  • 51. Summary – The lambda architecture  Can discard batch views and real-time views and recreate everything from scratch  Mistakes corrected via re-computation  Scalability through platform and distribution  Data storage layer optimized independently from query resolution layer  Still in a early stage …. But a very interesting idea! - Today a zoo of technologies are needed => Infrastructure group might not like it - Better with so-called Hadoop distributions and Hadoop V2 (YARN) - Logic has to be implemented twice (speed and batch layer) => Kappa architecture?
  • 52. “Kappa Architecture” Adapted from: Marz, N. & Warren, J. (2013) Big Data. Manning. Distribution Layer Speed Layer Visualization Batch Layer All data Incremented information Process stream Realtime increment Serving Layer real time view real time view DataService Sensor Layer Incoming Data social mobile IoT … Precomputed analytics analytic view DataService Batch Analytical analysis Replay
  • 54. Q & A Ulises Fasoli Senior Consultant – Application Development Tél. +41 21 321 47 00 ulises.fasoli@trivadis.com