SlideShare une entreprise Scribd logo
1  sur  45
2014 © Trivadis
BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
2014 © Trivadis
Big Data and Fast Data combined – is
it possible?
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
1
Ulises Fasoli
DBTA Workshop 2014
03.12.2014 - Bern
2014 © Trivadis
Ulises Fasoli
• Consultant @ Trivadis – Lausanne
• 7+ years of software development experience
• Occasional blogger
• Contact information :
• Email : ulises.fasoli@trivadis.com
• Blog: http://ufasoli.blogspot.com
• Twitter: ufasoli
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
2
2014 © Trivadis
Trivadis is a market leader in IT consulting, system integration,
solution engineering and the provision of IT services focusing
on and technologies in Switzerland,
Germany and Austria.
We offer our services in the following strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
Our company
O P E R A T I O N
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
3
2014 © Trivadis
AGENDA
1. Big Data and Fast Data, what is it?
2. Architecting (Big) Data Systems
3. The Lambda Architecture
4. Use Case and the Implementation
5. Summary and Outlook
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
4
2014 © Trivadis
Big Data Definition (4 Vs)
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
Time to action ? -> Big Data + Event Processing = Fast Data
Characteristics of Big Data: Its Volume,
Velocity and Variety in combination
5
2014 © Trivadis
The world is changing …
The model of Generating/Consuming Data has changed ….
Old Model: few companies are generating data, all others are consuming
data
New Model: all of us are generating data, and all of us are consuming data
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
6
2014 © Trivadis
19.11.2014
DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung
7
60
SECONDS
2014 © Trivadis
Internet Of Things – Sensors
are/will be everywhere
There are more devices tapping into
the internet than people on earth
How do we prepare our
systems/architecture for the future?
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
Source: CiscoSource: The Economist
8
2014 © Trivadis
The world is changing …
new data stores
Problem of traditional (R)DBMS approach:
 Complex object graph
 Schema evolution
 Semi-structured data
 Scaling
Polyglot persistence
 Using multiple data storage technologies (RDMBS + NoSQL + NewSQL + In-
Memory)
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
9
ORDER
ADDRESS
CUSTOMER
ORDER_LINES
Order
ID: 1001
Order Date: 15.9.2012
Line Items
Customer
First Name: Peter
Last Name: Sample
Billing Address
Street: Somestreet 10
City: Somewhere
Postal Code: 55901
Name
Ipod Touch
Monster Beat
Apple Mouse
Quantity
1
2
1
Price
220.95
190.00
69.90
2014 © Trivadis
The world is changing … New platforms evolving (i.e.
Hadoop Ecosystem)
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
10
2014 © Trivadis
Data as an Asset – Store everything?
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
Data is
just too valuable
to delete!
We must
store everything!
Nonsense! Just
store the data
you know
you need today!
It depends …
Big Data technologies allow to
store the raw information from
new and existing data sources so
that you can later use it to create
new data-driven products, which
you haven’t thought about today!
11
2014 © Trivadis
AGENDA
1. Big Data and Fast Data, what is it?
2. Architecting (Big) Data Systems
3. The Lambda Architecture
4. Use Case and the Implementation
5. Summary and Outlook
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
12
2014 © Trivadis
What is a data system?
• A (data) system that manages the storage and querying of
data with a lifetime measured in years encompassing
every version of the application to ever exist, every
hardware failure and every human mistake ever made.
• A data system answers questions based on information
that was acquired in the past
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
13
2014 © Trivadis
What is a data system? - Goal
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
14
query = function (all data)
• The goal of a data system is to compute arbitrary functions
on arbitrary data.
• Questions are answered by running functions that take data
as input
2014 © Trivadis
Desired properties of a data system
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
General Extensible
Allow
Ad-hoc queries
Robust and fault
tolerant
Low latency
read / updates
Scalable
15
Minimal
maintenance
Debuggable
2014 © Trivadis
How do we build (data) systems today – Today’s
Architectures
Source of Truth is mutable!
• CRUD pattern
What is the problem with this?
• Lack of Human Fault Tolerance
• Potential loss of
information/data
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
Mutable
Database
Application
(Query)
RDBMS
NoSQL
NewSQL
Mobile
Web
RIA
Rich Client
Source of Truth
Source of Truth
16
2014 © Trivadis
Lack of Human Fault Tolerance
Bugs will be deployed to production over the lifetime of a data system
Operational mistakes will be made
Humans are part of the overall system
• Just like hard disks, CPUs, memory, software
• design for human error like you design for any other fault
Examples of human error
• Deploy a bug that increments counters by two instead of by one
• Accidentally delete data from database
• Accidental DOS on important internal service
Worst two consequences: data loss or data corruption
As long as an error doesn‘t lose or corrupt good data, you can fix what
went wrong
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
17
2014 © Trivadis
Lack of Human Fault Tolerance – Immutability vs.
Mutability
The U and D in CRUD
A mutable system updates the current
state of the world
Mutable systems inherently lack
human fault-tolerance
Easy to corrupt or lose data
An immutable system captures
historical records of events
Each event happens at a particular
time and is always true
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
Immutability restricts the range of errors causing data loss/data corruption
Vastly more human fault-tolerant
Conclusion: Your source of truth should always be immutable
18
2014 © Trivadis
A different kind of architecture with immutable source of
truth
Instead of using our traditional approach … why not build data systems like
this
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
HDFS
NoSQL
NewSQL
RDBMS
View on
Data
Mobile
Web
RIA
Rich Client
Source of Truth
Immutable
data
View on
Data
Application
(Query)
Source of Truth
19
2014 © Trivadis
How to create the views on the Immutable data?
On the fly ?
Materialized, i.e. Pre-computed ?
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
Immutable
data
View
Immutable
data
Pre-
Computed
Views
Query
Query
20
2014 © Trivadis
(Big) Data Processing
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
Immutable
data
Pre-
Computed
Views
Query??
Incoming
Data
How to compute the materialized views ?
How to compute queries from the views ?
21
2014 © Trivadis
Today Big Data Processing means Batch Processing …
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
HDFS
Data Store optimized for
appending large results
Queries
Stream 1
Stream 2
Event
Hadoop cluster
(Map/Reduce)
Hadoop Distributed File System
22
2014 © Trivadis
Big Data Processing - Batch
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
01.02.13 Add iPAD 64GB
10.03.13 Add Sony RX-100
11. 03.13 Add Canon GX-10
11.03.13 Remove Sony RX-100
12.03.13 Add Nikon S-100
14.04.13 Add BoseQC-15
15.04.13 Add MacBook Pro 15
20.04.13 Remove Canon GX10
iPAD 64GB
Nikon S-100
BoseQC-15
MacBook Pro 15
4compute derive
Favorite Product List Changes
Current Favorite
Product List
Current
Product
Count
Raw information => data
Information => derived
23
2014 © Trivadis
Big Data Processing –
Batch
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
 Using only batch processing, leaves you always with a portion of non-
processed data.
Fully processed data Last full
batch period
Time for
batch job
time
nownon-processed data
time
now
batch-processed data
But we are not done yet …
24
Source of truth
results
2014 © Trivadis
Big Data Processing - Adding Real-Time
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
Immutable
data
Batch
Views
Query
?
Data
Stream
Realtime
Views
Incoming
Data
How to compute queries
from the views ?How to compute real-time views
25
2014 © Trivadis
Big Data Processing - Adding Real-Time
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
1.2.13 Add iPAD 64GB
10.3.13 Add Sony RX-100
11..3.13 Add Canon GX-10
11.3.13 Remove Sony RX-100
12.3.13 Add Nikon S-100
14.4.13 Add BoseQC-15
15.4.13 Add MacBook Pro 15
20.4.13 Remove Canon GX10
Now Add Canon Scanner
iPAD 64GB
Nikon S-100
BoseQC-15
MacBook Pro 15
5
compute
Favorite Product List Changes
Current Favorite
Product List
Current
Product
Count
Now Canon ScannercomputeAdd Canon Scanner
Stream of
Favorite Product List Changes
Immutable data
Views
Data Stream
Query
incoming
26
2014 © Trivadis
Big Data Processing -
Batch & Real Time
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
time
Fully processed data Last full
batch period
now
Time for
batch job
batch processing
worked fine here
(e.g. Hadoop)
real time processing
works here
blended view for end user
Adapted from Ted Dunning (March 2012):
http://www.youtube.com/watch?v=7PcmbI5aC20
27
2014 © Trivadis
AGENDA
1. Big Data and Fast Data, what is it?
2. Architecting (Big) Data Systems
3. The Lambda Architecture
4. The Use Case and the Implementation
5. Summary and Outlook
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
28
2014 © Trivadis
Lambda Architecture
Lambda => Query = function(all data)
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
29
Immutable
data
Batch
View
Query
Data
Stream
Realtime
View
Incoming
Data
Serving Layer
Speed Layer
Batch Layer
A
B
C D
E
F
G
2014 © Trivadis
Lambda Architecture
A. All data is sent to both the batch and speed layer
B. Master data set is an immutable, append-only set of data
C. Batch layer pre-computes query functions from scratch, result is called Batch
Views. Batch layer constantly re-computes the batch views.
D. Batch views are indexed and stored in a scalable database to get particular
values very quickly. Swaps in new batch views when they are available
E. Speed layer compensates for the high latency of updates to the Batch Views
F. Uses fast incremental algorithms and read/write databases to produce real-
time views
G. Queries are resolved by getting results from both batch and real-time views
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
30
2014 © Trivadis
Lambda Architecture
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
Stores the immutable constantly growing dataset
Computes arbitrary views from this dataset using BigData
technologies (can take hours)
Can be always recreated
Computes the views from the constant stream of data it receives
Needed to compensate for the high latency of the batch layer
Incremental model and views are transient
Responsible for indexing and exposing the pre-computed batch
views so that they can be queried
Exposes the incremented real-time views
Merges the batch and the real-time views into a consistent result
Serving Layer
Batch Layer
Speed Layer
31
2014 © Trivadis
Lambda Architecture
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
Adapted from: Marz, N. & Warren, J. (2013) Big Data. Manning.
32
Distribution
Layer
Speed Layer
Precompute
Views
Visualization
Batch Layer
Precomputed
information
All data
Incremented
information
Process stream
Batch
recompute
Realtime
increment
Serving Layer
batch view
batch view
real time view
real time view
DataService(Merge)
Sensor
Layer
Incoming
Data
social
mobile
IoT
…
2014 © Trivadis
AGENDA
1. Big Data and Fast Data, what is it?
2. Architecting (Big) Data Systems
3. The Lambda Architecture
4. Use Case and the Implementation
5. Summary and Outlook
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
33
2014 © Trivadis
Project Definition
• Build a platform for analysing Twitter communications in retrospective
and in real-time
• Scalability and ability for future data fusion with other information is a
must
• Provide a Web-based access to the analytical information
• Invest into new, innovative and not widely-proven technology
• PoC environment, a pre-invest for future systems
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
34
2014 © Trivadis
"profile_banner_url":"https://pbs.twimg.com/profile_banners/15032594/1371570
460",
"profile_link_color":"2FC2EF",
"profile_sidebar_border_color":"FFFFFF",
"profile_sidebar_fill_color":"252429",
"profile_text_color":"666666",
"profile_use_background_image":true,
"default_profile":false,
"default_profile_image":false,
"following":null,
"follow_request_sent":null,
"notifications":null},
"geo":{
"type":"Point","coordinates":[43.28261499,-2.96464655]},
"coordinates":{"type":"Point","coordinates":[-2.96464655,43.28261499]},
"place":{"id":"cd43ea85d651af92",
"url":"https://api.twitter.com/1.1/geo/id/cd43ea85d651af92.json",
"place_type":"city",
"name":"Bilbao",
"full_name":"Bilbao, Vizcaya",
"country_code":"ES",
"country":"Espau00f1a",
"bounding_box":{"type":"Polygon","coordinates":[[[-2.9860102,43.2136542],
[-2.9860102,43.2901452],[-2.8803248,43.2901452],[-2.8803248,43.2136542]]]},
"attributes":{}},
"contributors": null,
"retweet_count":0,
"favorite_count":0,
"entities":{"hashtags":[{"text":"quelosepash","indices":[58,70]}],
"symbols":[],
"urls":[],
"user_mentions":[]},
"favorited":false,
"retweeted":false,
"filter_level":"medium",
"lang":"es“
}
Anatomy of a tweet
35
{
"created_at":"Sun Aug 18 14:29:11 +0000 2013",
"id":369103686938546176,
"id_str":"369103686938546176",
"text":"Baloncesto preparaciu00f3n Eslovenia, Rajoy derrota a Merkel.
#quelosepash",
"source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow”
u003eTwitter for iPhoneu003c/au003e",
"truncated":false,
"in_reply_to_status_id":null,
"in_reply_to_status_id_str":null,
"in_reply_to_user_id":null,
"in_reply_to_user_id_str":null,
"in_reply_to_screen_name":null,
"user":{
"id":15032594,
"id_str":"15032594",
"name":"Juan Carlos Romou2122",
"screen_name":"jcsromo",
"location":"Sopuerta, Vizcaya",
"url":null,
"description":"Portugalujo, saturado de todo, de baloncesto no. Twitter personal.",
"protected":false,
"followers_count":1331,
"friends_count":1326,
"listed_count":31,
"created_at":"Fri Jun 06 21:21:22 +0000 2008",
"favourites_count":255,
"utc_offset":7200,
"time_zone":"Madrid",
"geo_enabled":true,
"verified":false,
"statuses_count":22787,
"lang":"es",
"contributors_enabled":false,
"is_translator":false,
…
"profile_image_url_https":"https://si0.twimg.com/profile_images/2649762203
be4973d9eb457a45077897879c47c8b7_normal.jpeg",
Time Space Content Social Technical
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
2014 © Trivadis
Views on Tweets in four dimensions
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
36
when ⇐ where+what+who
• Time series
• Timelines
where ⇐ when+what+who
• Geo maps
• Density plots
what ⇐ when+where+who
• Word clouds
• Topic trends
who ⇐ when+where+what
• Social network graphs
• Activity graphs
Time
Space
Social
Content
Time
Space
Social
Content
Time
Space
Social
Content
Time
SpaceSocial
Content
2014 © Trivadis
Accessing Twitter
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
37
source Limit Price
Twitter’s Search API 3200 / user
5000 / keyword
180 queries/ 15 minute
free
Twitter’s Streaming API 1%-10% of tweets volume free
DataSift
none
0.15 -0.20$ /
unit
Gnip (acquired by twitter) none By quote
2014 © Trivadis
Lambda Architecture
Open Source Frameworks for implementing a Lambda Architecture
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
38
Distribution
Layer
Speed Layer
Precompute
Views
Visualization
Batch Layer
Precomputed
information
All data
Incremented
information
Process stream
Batch
recompute
Realtime
increment
Serving Layer
batch view
batch view
real time view
real time view
DataService(Merge)
Sensor
Layer
Incoming
Data
social
mobile
IoT
…
2014 © Trivadis
Lambda Architecture in Action
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
39
Cloudera Distribution
• Distribution of Apache Hadoop: HDFS,
MapReduce, Hive, Flume, Pig, Impala
Cloudera Impala
• distributed query execution engine that runs
against data stored in HDFS and HBase
Apache Zookeeper
• Distributed, highly available coordination service.
Provides primitives such as distributed locks
Apache Storm & Trident
• distributed, fault-tolerant real-time computation
system
Apache Cassandra
• distributed database management system
designed to handle large amounts of data across
many commodity servers, providing high
availability with no single point of failure
Twitter Horsebird Client (hbc)
• Twitter Java API over Streaming API
Spring Framework
• Popular Java Framework used to modularize
part of the logic (sensor and serving layer)
Apache Kafka
• Simple messaging framework based on file
system to distribute information to both batch
and speed layer
Apache Avro
• Serialization system for efficient cross-language
RPC and persistent data storage
JSON
• open standard format that uses human-readable
text to transmit data objects consisting of
attribute–value pairs.
2014 © Trivadis
Facts & Figures
Currently in total
• 2.7 TB Raw Data
• 1.1 TB Pre-Processed data in
Impala
• 1 TB Solr indices for full text search
Cloudera 4.7.0 with Hadoop, Pig,
Hive, Impala and Solr
Kafka 0.7, Storm 0.9, DataStax
Enterprise Edition
14 active twitter feeds
• ~ 14 million tweets/day ( > 5 billion
tweets/year)
• ~ 8 GB/day raw data, compressed (2
DVDs)
• 66 GB storage capacity / day
(replication & views/results included)
Cluster of 10 nodes
• ~100 processors
• ~40 TB HD capacity in total; 46%
used
• >500 GB RAM
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
40
2014 © Trivadis
AGENDA
1. Big Data and Fast Data, what is it?
2. Architecting (Big) Data Systems
3. The Lambda Architecture
4. Use Case and the Implementation
5. Summary and Outlook
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
41
2014 © Trivadis
Summary – The lambda architecture
• Can discard batch views and real-time views and recreate
everything from scratch
• Mistakes corrected via re-computation
• Scalability through platform and distribution
• Data storage layer optimized independently from query resolution layer
• Still in a early stage …. But a very interesting idea!
• Today a zoo of technologies are needed => Infrastructure group might not like it
• Better with so-called Hadoop distributions and Hadoop V2 (YARN)
• Logic has to be implemented twice (speed and batch layer)
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
42
=> Kappa architecture?
2014 © Trivadis
“Kappa Architecture”
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
Adapted from: Marz, N. & Warren, J. (2013) Big Data. Manning.
43
Distribution
Layer
Speed Layer
Visualization
Batch Layer
All data
Incremented
information
Process stream
Realtime
increment
Serving Layer
real time view
real time view
DataService
Sensor
Layer
Incoming
Data
social
mobile
IoT
…
Precomputed
analytics
analytic view
DataService
Batch
Analytical analysis
Replay
2014 © Trivadis
Weitere Informationen...
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?
44
http://www.digitallifeplus.com/18913/what-happens-online-in-60-seconds-
infographic/
http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
http://manning.com/marz/
Manning : Big Data
2014 © Trivadis
BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
Fragen und Antworten...
2013 © Trivadis
Ulises Fasoli
Consultant – Trivadis Lausanne
ulises.fasoli@trivadis.com
03.12.2014
DBTA Workshop | Big Data and Fast Data combined – is it possible?

Contenu connexe

Dernier

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 

Dernier (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Big data and fast data combined – is it possible

  • 1. 2014 © Trivadis BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN 2014 © Trivadis Big Data and Fast Data combined – is it possible? 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 1 Ulises Fasoli DBTA Workshop 2014 03.12.2014 - Bern
  • 2. 2014 © Trivadis Ulises Fasoli • Consultant @ Trivadis – Lausanne • 7+ years of software development experience • Occasional blogger • Contact information : • Email : ulises.fasoli@trivadis.com • Blog: http://ufasoli.blogspot.com • Twitter: ufasoli 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 2
  • 3. 2014 © Trivadis Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany and Austria. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. Our company O P E R A T I O N 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 3
  • 4. 2014 © Trivadis AGENDA 1. Big Data and Fast Data, what is it? 2. Architecting (Big) Data Systems 3. The Lambda Architecture 4. Use Case and the Implementation 5. Summary and Outlook 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 4
  • 5. 2014 © Trivadis Big Data Definition (4 Vs) 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? Time to action ? -> Big Data + Event Processing = Fast Data Characteristics of Big Data: Its Volume, Velocity and Variety in combination 5
  • 6. 2014 © Trivadis The world is changing … The model of Generating/Consuming Data has changed …. Old Model: few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 6
  • 7. 2014 © Trivadis 19.11.2014 DOAG 2014 | Big Data und Fast Data - Lambda Architektur und deren Umsetzung 7 60 SECONDS
  • 8. 2014 © Trivadis Internet Of Things – Sensors are/will be everywhere There are more devices tapping into the internet than people on earth How do we prepare our systems/architecture for the future? 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? Source: CiscoSource: The Economist 8
  • 9. 2014 © Trivadis The world is changing … new data stores Problem of traditional (R)DBMS approach:  Complex object graph  Schema evolution  Semi-structured data  Scaling Polyglot persistence  Using multiple data storage technologies (RDMBS + NoSQL + NewSQL + In- Memory) 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 9 ORDER ADDRESS CUSTOMER ORDER_LINES Order ID: 1001 Order Date: 15.9.2012 Line Items Customer First Name: Peter Last Name: Sample Billing Address Street: Somestreet 10 City: Somewhere Postal Code: 55901 Name Ipod Touch Monster Beat Apple Mouse Quantity 1 2 1 Price 220.95 190.00 69.90
  • 10. 2014 © Trivadis The world is changing … New platforms evolving (i.e. Hadoop Ecosystem) 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 10
  • 11. 2014 © Trivadis Data as an Asset – Store everything? 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? Data is just too valuable to delete! We must store everything! Nonsense! Just store the data you know you need today! It depends … Big Data technologies allow to store the raw information from new and existing data sources so that you can later use it to create new data-driven products, which you haven’t thought about today! 11
  • 12. 2014 © Trivadis AGENDA 1. Big Data and Fast Data, what is it? 2. Architecting (Big) Data Systems 3. The Lambda Architecture 4. Use Case and the Implementation 5. Summary and Outlook 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 12
  • 13. 2014 © Trivadis What is a data system? • A (data) system that manages the storage and querying of data with a lifetime measured in years encompassing every version of the application to ever exist, every hardware failure and every human mistake ever made. • A data system answers questions based on information that was acquired in the past 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 13
  • 14. 2014 © Trivadis What is a data system? - Goal 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 14 query = function (all data) • The goal of a data system is to compute arbitrary functions on arbitrary data. • Questions are answered by running functions that take data as input
  • 15. 2014 © Trivadis Desired properties of a data system 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? General Extensible Allow Ad-hoc queries Robust and fault tolerant Low latency read / updates Scalable 15 Minimal maintenance Debuggable
  • 16. 2014 © Trivadis How do we build (data) systems today – Today’s Architectures Source of Truth is mutable! • CRUD pattern What is the problem with this? • Lack of Human Fault Tolerance • Potential loss of information/data 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? Mutable Database Application (Query) RDBMS NoSQL NewSQL Mobile Web RIA Rich Client Source of Truth Source of Truth 16
  • 17. 2014 © Trivadis Lack of Human Fault Tolerance Bugs will be deployed to production over the lifetime of a data system Operational mistakes will be made Humans are part of the overall system • Just like hard disks, CPUs, memory, software • design for human error like you design for any other fault Examples of human error • Deploy a bug that increments counters by two instead of by one • Accidentally delete data from database • Accidental DOS on important internal service Worst two consequences: data loss or data corruption As long as an error doesn‘t lose or corrupt good data, you can fix what went wrong 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 17
  • 18. 2014 © Trivadis Lack of Human Fault Tolerance – Immutability vs. Mutability The U and D in CRUD A mutable system updates the current state of the world Mutable systems inherently lack human fault-tolerance Easy to corrupt or lose data An immutable system captures historical records of events Each event happens at a particular time and is always true 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? Immutability restricts the range of errors causing data loss/data corruption Vastly more human fault-tolerant Conclusion: Your source of truth should always be immutable 18
  • 19. 2014 © Trivadis A different kind of architecture with immutable source of truth Instead of using our traditional approach … why not build data systems like this 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? HDFS NoSQL NewSQL RDBMS View on Data Mobile Web RIA Rich Client Source of Truth Immutable data View on Data Application (Query) Source of Truth 19
  • 20. 2014 © Trivadis How to create the views on the Immutable data? On the fly ? Materialized, i.e. Pre-computed ? 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? Immutable data View Immutable data Pre- Computed Views Query Query 20
  • 21. 2014 © Trivadis (Big) Data Processing 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? Immutable data Pre- Computed Views Query?? Incoming Data How to compute the materialized views ? How to compute queries from the views ? 21
  • 22. 2014 © Trivadis Today Big Data Processing means Batch Processing … 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? HDFS Data Store optimized for appending large results Queries Stream 1 Stream 2 Event Hadoop cluster (Map/Reduce) Hadoop Distributed File System 22
  • 23. 2014 © Trivadis Big Data Processing - Batch 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 01.02.13 Add iPAD 64GB 10.03.13 Add Sony RX-100 11. 03.13 Add Canon GX-10 11.03.13 Remove Sony RX-100 12.03.13 Add Nikon S-100 14.04.13 Add BoseQC-15 15.04.13 Add MacBook Pro 15 20.04.13 Remove Canon GX10 iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15 4compute derive Favorite Product List Changes Current Favorite Product List Current Product Count Raw information => data Information => derived 23
  • 24. 2014 © Trivadis Big Data Processing – Batch 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible?  Using only batch processing, leaves you always with a portion of non- processed data. Fully processed data Last full batch period Time for batch job time nownon-processed data time now batch-processed data But we are not done yet … 24 Source of truth results
  • 25. 2014 © Trivadis Big Data Processing - Adding Real-Time 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? Immutable data Batch Views Query ? Data Stream Realtime Views Incoming Data How to compute queries from the views ?How to compute real-time views 25
  • 26. 2014 © Trivadis Big Data Processing - Adding Real-Time 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 1.2.13 Add iPAD 64GB 10.3.13 Add Sony RX-100 11..3.13 Add Canon GX-10 11.3.13 Remove Sony RX-100 12.3.13 Add Nikon S-100 14.4.13 Add BoseQC-15 15.4.13 Add MacBook Pro 15 20.4.13 Remove Canon GX10 Now Add Canon Scanner iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15 5 compute Favorite Product List Changes Current Favorite Product List Current Product Count Now Canon ScannercomputeAdd Canon Scanner Stream of Favorite Product List Changes Immutable data Views Data Stream Query incoming 26
  • 27. 2014 © Trivadis Big Data Processing - Batch & Real Time 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? time Fully processed data Last full batch period now Time for batch job batch processing worked fine here (e.g. Hadoop) real time processing works here blended view for end user Adapted from Ted Dunning (March 2012): http://www.youtube.com/watch?v=7PcmbI5aC20 27
  • 28. 2014 © Trivadis AGENDA 1. Big Data and Fast Data, what is it? 2. Architecting (Big) Data Systems 3. The Lambda Architecture 4. The Use Case and the Implementation 5. Summary and Outlook 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 28
  • 29. 2014 © Trivadis Lambda Architecture Lambda => Query = function(all data) 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 29 Immutable data Batch View Query Data Stream Realtime View Incoming Data Serving Layer Speed Layer Batch Layer A B C D E F G
  • 30. 2014 © Trivadis Lambda Architecture A. All data is sent to both the batch and speed layer B. Master data set is an immutable, append-only set of data C. Batch layer pre-computes query functions from scratch, result is called Batch Views. Batch layer constantly re-computes the batch views. D. Batch views are indexed and stored in a scalable database to get particular values very quickly. Swaps in new batch views when they are available E. Speed layer compensates for the high latency of updates to the Batch Views F. Uses fast incremental algorithms and read/write databases to produce real- time views G. Queries are resolved by getting results from both batch and real-time views 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 30
  • 31. 2014 © Trivadis Lambda Architecture 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? Stores the immutable constantly growing dataset Computes arbitrary views from this dataset using BigData technologies (can take hours) Can be always recreated Computes the views from the constant stream of data it receives Needed to compensate for the high latency of the batch layer Incremental model and views are transient Responsible for indexing and exposing the pre-computed batch views so that they can be queried Exposes the incremented real-time views Merges the batch and the real-time views into a consistent result Serving Layer Batch Layer Speed Layer 31
  • 32. 2014 © Trivadis Lambda Architecture 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? Adapted from: Marz, N. & Warren, J. (2013) Big Data. Manning. 32 Distribution Layer Speed Layer Precompute Views Visualization Batch Layer Precomputed information All data Incremented information Process stream Batch recompute Realtime increment Serving Layer batch view batch view real time view real time view DataService(Merge) Sensor Layer Incoming Data social mobile IoT …
  • 33. 2014 © Trivadis AGENDA 1. Big Data and Fast Data, what is it? 2. Architecting (Big) Data Systems 3. The Lambda Architecture 4. Use Case and the Implementation 5. Summary and Outlook 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 33
  • 34. 2014 © Trivadis Project Definition • Build a platform for analysing Twitter communications in retrospective and in real-time • Scalability and ability for future data fusion with other information is a must • Provide a Web-based access to the analytical information • Invest into new, innovative and not widely-proven technology • PoC environment, a pre-invest for future systems 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 34
  • 35. 2014 © Trivadis "profile_banner_url":"https://pbs.twimg.com/profile_banners/15032594/1371570 460", "profile_link_color":"2FC2EF", "profile_sidebar_border_color":"FFFFFF", "profile_sidebar_fill_color":"252429", "profile_text_color":"666666", "profile_use_background_image":true, "default_profile":false, "default_profile_image":false, "following":null, "follow_request_sent":null, "notifications":null}, "geo":{ "type":"Point","coordinates":[43.28261499,-2.96464655]}, "coordinates":{"type":"Point","coordinates":[-2.96464655,43.28261499]}, "place":{"id":"cd43ea85d651af92", "url":"https://api.twitter.com/1.1/geo/id/cd43ea85d651af92.json", "place_type":"city", "name":"Bilbao", "full_name":"Bilbao, Vizcaya", "country_code":"ES", "country":"Espau00f1a", "bounding_box":{"type":"Polygon","coordinates":[[[-2.9860102,43.2136542], [-2.9860102,43.2901452],[-2.8803248,43.2901452],[-2.8803248,43.2136542]]]}, "attributes":{}}, "contributors": null, "retweet_count":0, "favorite_count":0, "entities":{"hashtags":[{"text":"quelosepash","indices":[58,70]}], "symbols":[], "urls":[], "user_mentions":[]}, "favorited":false, "retweeted":false, "filter_level":"medium", "lang":"es“ } Anatomy of a tweet 35 { "created_at":"Sun Aug 18 14:29:11 +0000 2013", "id":369103686938546176, "id_str":"369103686938546176", "text":"Baloncesto preparaciu00f3n Eslovenia, Rajoy derrota a Merkel. #quelosepash", "source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow” u003eTwitter for iPhoneu003c/au003e", "truncated":false, "in_reply_to_status_id":null, "in_reply_to_status_id_str":null, "in_reply_to_user_id":null, "in_reply_to_user_id_str":null, "in_reply_to_screen_name":null, "user":{ "id":15032594, "id_str":"15032594", "name":"Juan Carlos Romou2122", "screen_name":"jcsromo", "location":"Sopuerta, Vizcaya", "url":null, "description":"Portugalujo, saturado de todo, de baloncesto no. Twitter personal.", "protected":false, "followers_count":1331, "friends_count":1326, "listed_count":31, "created_at":"Fri Jun 06 21:21:22 +0000 2008", "favourites_count":255, "utc_offset":7200, "time_zone":"Madrid", "geo_enabled":true, "verified":false, "statuses_count":22787, "lang":"es", "contributors_enabled":false, "is_translator":false, … "profile_image_url_https":"https://si0.twimg.com/profile_images/2649762203 be4973d9eb457a45077897879c47c8b7_normal.jpeg", Time Space Content Social Technical 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible?
  • 36. 2014 © Trivadis Views on Tweets in four dimensions 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 36 when ⇐ where+what+who • Time series • Timelines where ⇐ when+what+who • Geo maps • Density plots what ⇐ when+where+who • Word clouds • Topic trends who ⇐ when+where+what • Social network graphs • Activity graphs Time Space Social Content Time Space Social Content Time Space Social Content Time SpaceSocial Content
  • 37. 2014 © Trivadis Accessing Twitter 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 37 source Limit Price Twitter’s Search API 3200 / user 5000 / keyword 180 queries/ 15 minute free Twitter’s Streaming API 1%-10% of tweets volume free DataSift none 0.15 -0.20$ / unit Gnip (acquired by twitter) none By quote
  • 38. 2014 © Trivadis Lambda Architecture Open Source Frameworks for implementing a Lambda Architecture 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 38 Distribution Layer Speed Layer Precompute Views Visualization Batch Layer Precomputed information All data Incremented information Process stream Batch recompute Realtime increment Serving Layer batch view batch view real time view real time view DataService(Merge) Sensor Layer Incoming Data social mobile IoT …
  • 39. 2014 © Trivadis Lambda Architecture in Action 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 39 Cloudera Distribution • Distribution of Apache Hadoop: HDFS, MapReduce, Hive, Flume, Pig, Impala Cloudera Impala • distributed query execution engine that runs against data stored in HDFS and HBase Apache Zookeeper • Distributed, highly available coordination service. Provides primitives such as distributed locks Apache Storm & Trident • distributed, fault-tolerant real-time computation system Apache Cassandra • distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure Twitter Horsebird Client (hbc) • Twitter Java API over Streaming API Spring Framework • Popular Java Framework used to modularize part of the logic (sensor and serving layer) Apache Kafka • Simple messaging framework based on file system to distribute information to both batch and speed layer Apache Avro • Serialization system for efficient cross-language RPC and persistent data storage JSON • open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs.
  • 40. 2014 © Trivadis Facts & Figures Currently in total • 2.7 TB Raw Data • 1.1 TB Pre-Processed data in Impala • 1 TB Solr indices for full text search Cloudera 4.7.0 with Hadoop, Pig, Hive, Impala and Solr Kafka 0.7, Storm 0.9, DataStax Enterprise Edition 14 active twitter feeds • ~ 14 million tweets/day ( > 5 billion tweets/year) • ~ 8 GB/day raw data, compressed (2 DVDs) • 66 GB storage capacity / day (replication & views/results included) Cluster of 10 nodes • ~100 processors • ~40 TB HD capacity in total; 46% used • >500 GB RAM 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 40
  • 41. 2014 © Trivadis AGENDA 1. Big Data and Fast Data, what is it? 2. Architecting (Big) Data Systems 3. The Lambda Architecture 4. Use Case and the Implementation 5. Summary and Outlook 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 41
  • 42. 2014 © Trivadis Summary – The lambda architecture • Can discard batch views and real-time views and recreate everything from scratch • Mistakes corrected via re-computation • Scalability through platform and distribution • Data storage layer optimized independently from query resolution layer • Still in a early stage …. But a very interesting idea! • Today a zoo of technologies are needed => Infrastructure group might not like it • Better with so-called Hadoop distributions and Hadoop V2 (YARN) • Logic has to be implemented twice (speed and batch layer) 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 42 => Kappa architecture?
  • 43. 2014 © Trivadis “Kappa Architecture” 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? Adapted from: Marz, N. & Warren, J. (2013) Big Data. Manning. 43 Distribution Layer Speed Layer Visualization Batch Layer All data Incremented information Process stream Realtime increment Serving Layer real time view real time view DataService Sensor Layer Incoming Data social mobile IoT … Precomputed analytics analytic view DataService Batch Analytical analysis Replay
  • 44. 2014 © Trivadis Weitere Informationen... 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible? 44 http://www.digitallifeplus.com/18913/what-happens-online-in-60-seconds- infographic/ http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html http://manning.com/marz/ Manning : Big Data
  • 45. 2014 © Trivadis BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Fragen und Antworten... 2013 © Trivadis Ulises Fasoli Consultant – Trivadis Lausanne ulises.fasoli@trivadis.com 03.12.2014 DBTA Workshop | Big Data and Fast Data combined – is it possible?