SlideShare une entreprise Scribd logo
1  sur  51
© 2015 MapR Technologies ‹#›© 2016 MapR Technologies
Lambda Architecture: The Best Way to Build
Scalable and Reliable Applications!
© 2016 MapR Technologies ‹#›@tgrall
{“about” : “me”}
Tugdual “Tug” Grall
• MapR
• Technical Evangelist
• MongoDB
• Technical Evangelist
• Couchbase
• Technical Evangelist
• eXo
• CTO
• Oracle
• Developer/Product Manager
• Mainly Java/SOA
• Developer in consulting firms
• Web
• @tgrall
• http://tgrall.github.io
• tgrall
• NantesJUG co-founder
• Pet Project :
• http://www.resultri.com
• tug@mapr.com
• tugdual@gmail.com
© 2016 MapR Technologies@tgrall 3
Big Data & Hadoop
In Production
© 2016 MapR Technologies 4
Data Warehouse Optimization
© 2016 MapR Technologies 5
Data Hub
Choose the best “connector”:
• File
• Sqoop
• ETL
• …
Use the aggregated data
• In your applications
• To update other systems
• as an Open Data API
• …
Customer DB
Customer DB
Logs
…
Hadoop
NoSQL
© 2016 MapR Technologies 6
Financial Services
Fraud detection
Personalized
offers
Fraud
investigation tool
Fraud investigator
Fraud model
Recommendations
table
Clickstream
analysis
Online
transactions
MapR Distribution for Hadoop
Analytics
Real-time Operational Applications
Interactive marketer
© 2016 MapR Technologies@tgrall 7
Fault Tolerance
© 2016 MapR Technologies 8
Fault Tolerance
hardware
software
developer
?
© 2016 MapR Technologies 9
Human fault tolerance
© 2014 MapR Technologies 10
© 2014 MapR Technologies 11
© 2014 MapR Technologies 12
© 2016 MapR Technologies@tgrall 13
Lambda Architecture
To the rescue
λ
© 2016 MapR Technologies 14
A little bit of history….
• Defined by Nathan Marz
• ex BackType, Twitter
• in a new Startup
• Creator of …
– Storm
– Cascalog
– ElephantDB
© 2016 MapR Technologies 15
Lambda Architecture Requirements
• Fault-tolerant against both hardware failures & human errors
• Support variety of use cases that include low latency querying
as well as updates
• Linear scale-out capabilities
• Extensible, so that the system is manageable and can
accommodate newer features easily
© 2016 MapR Technologies 16
© 2016 MapR Technologies 17
Lambda Architecture
NEW DATA
STREAM QUERY
BATCH VIEWS
√View 1 View 2 View N
REAL-TIME VIEWS
BATCH LAYER
SERVINGLAYER
SPEED LAYER
MERGE
IMMUTABLE
MASTER DATA
PRECOMPUTE
VIEWSBATCH
RECOMPUTE
PROCESS
STREAM
INCREMENT
VIEWS
View 1 View 2 View N
© 2016 MapR Technologies 18
Data Ingestion
All data entering the system are dispatched to both
• the batch layer
• the speed layer
NEW DATA
STREAM
BATCH LAYER
SPEED LAYER
© 2016 MapR Technologies
Batch Layer
• managing the master dataset, an immutable, append-only set of raw data
• pre-computing arbitrary query functions, called batch views.
BATCH VIEWS
BATCH LAYER
IMMUTABLE
MASTER DATA
PRECOMPUTE
VIEWSBATCH
RECOMPUTE
View 1 View 2 View N
© 2016 MapR Technologies 20
Speed Layer
√View 1 View 2 View N
REAL-TIME VIEWS
SPEED LAYER
PROCESS
STREAM
INCREMENT
VIEWS
• Speed layer accommodates low latency requests that are subject to
low latency requirements.
• Using fast and incremental algorithms, deals with recent data
only
© 2016 MapR Technologies 21
Serving Layer
QUERY
BATCH VIEWS
√View 1 View 2 View N
REAL-TIME VIEWS
SERVINGLAYER
MERGE
View 1 View 2 View N
• Serving layer indexes batch views so that they can be
queried in ad hoc with low latency
© 2014 MapR Technologies 22
Lambda Architecture—Compensate Batch
time
not absorbed
now
© 2016 MapR Technologies 23
Lambda Architecture—Immutable Data + Views
http://openflights.org
© 2016 MapR Technologies 24
Lambda Architecture—Immutable Data + Views
timestamp airport flight action
2016-02-04T10:00:00 MUC EY123 take-off
2016-02-04T10:05:00 BRU SAS45 take-off
2016-02-04T10:07:00 AMS BA99 take-off
2016-02-04T10:09:00 LHR LH17 landing
2016-02-04T10:10:00 CDG AF03 landing
2016-02-04T10:10:00 FCO AZ501 take-off
immutable master dataset
© 2016 MapR Technologies 25
Lambda Architecture—Immutable Data + Views
timestamp airport flight action
2016-02-04T10:00:00 MUC EY123 take-off
2016-02-04T10:05:00 BRU SAS45 take-off
2016-02-04T10:07:00 AMS BA99 take-off
2016-02-04T10:09:00 LHR LH17 landing
2016-02-04T10:10:00 CDG AF03 landing
2016-02-04T10:10:00 FCO AZ501 take-off
air-borne: 2307
airline planes
AF 59
AZ 23
BA 167
EY 19
LH 201
SAS 28
air-borne per airline:
airport planes
AMS 69
CDG 44
BRU 31
FCO 10
HEL 17
LHR 101
airport load:
© 2016 MapR Technologies@tgrall 26
Implementation
© 2016 MapR Technologies 27
Lambda Architecture
NEW DATA
STREAM QUERY
BATCH VIEWS
√View 1 View 2 View N
REAL-TIME VIEWS
BATCH LAYER
SERVINGLAYER
SPEED LAYER
MERGE
IMMUTABLE
MASTER DATA
PRECOMPUTE
VIEWSBATCH
RECOMPUTE
PROCESS
STREAM
INCREMENT
VIEWS
View 1 View 2 View N
© 2016 MapR Technologies 28
Batch Layer: View Generation
Master
Data
View 1
View 2
Master
Data
Master
Data
Master
Data
Events “Raw” Storage Processing Aggregated Data
© 2016 MapR Technologies 29
© 2016 MapR Technologies 30
• Cluster Computing Platform
• Extends “MapReduce” with
extensions
– Streaming
– Interactive Analytics
• Run in Memory
© 2015 MapR Technologies ‹#›@tgrall
Spark components
Spark SQL
Spark Streaming
(Streaming)
MLlib
(Machine Learning)
Spark Core (General execution engine)
GraphX
(Graph Computation)
Mesos
Distributed File System (HDFS, MapR-FS, S3, …)
Hadoop YARN
© 2016 MapR Technologies 32
Spark Jobs
Driver Program
(application)
sc=new SparkContext
rDD=sc.textfile(“hdfs://…”)
rDD.map
Cluster Manager
Worker
Executor
Task Task
Worker
Executor
Task Task
© 2016 MapR Technologies 33
Spark Resilient Distributed Datasets “RDD”
Sensor RDD
W
Executor
P4
W
Executor
P1 P3
W
Executor
P2
sc.textFile P1
8213034705,
95, 2.927373,
jake7870, 0……
P2
8213034705,
115, 2.943484,
Davidbresler2,
1….
P3
8213034705,
100, 2.951285,
gladimacowgirl,
58…
P4
8213034705,
117, 2.998947,
daysrus, 95….
© 2016 MapR Technologies 34
Spark Resilient Distributed Datasets
Transformation
Filter()
Action
Count()
RDD
newRDD
Value
© 2015 MapR Technologies@tgrall
Transformations
• Process an RDD, returns an RDD
• Examples :
• map() : one value => another value
• mapToPair() : one value => a tuple
• filter() : filters values/tuples on a given condition
• groupByKey() : groups values by key
• reduceByKey() : aggregates values by key
• join(), cogroup(), … : joins RDDs
© 2015 MapR Technologies@tgrall
Actions
• Process an RDD, returns a value
• Examples :
• count() : counts number of items in dataset
• first() : returns first entry
• take(n) : returns array of the n first elements
• foreach() : applies a function on each element
• collect() : returns all elements
• saveAsTextFile() : saves in files each element
© 2016 MapR Technologies 37
Speed Layer
Real Time View1
Real Time View 2
Events Processing NoSQL
© 2016 MapR Technologies 38
Serving Layer: Aggregated Data
• Views are stored in a Read/Write database
• Apache HBase
• MapR DB Binary & JSON
• Cassandra
• MongoDB
• Elasticsearch
• …
© 2016 MapR Technologies 39
Serving Layer
Real Time View
Events Processing Aggregated
Batch View
Query-SQL
Dataviz
Query/Visualisation
SQL
© 2016 MapR Technologies
// Join MapR-DB Table, Parquet and MongoDB collection
> SELECT u.name, b.category, count(1) nb_review
FROM mongo.yelp.`user` u , dfs.yelp.`review.parquet` r, (select business_id,
flatten(categories) category from maprdb.`business` ) b
WHERE u.user_id = r.user_id
AND b.business_id = r.business_id
GROUP BY u.user_id, u.name, b.category
ORDER BY nb_review DESC
LIMIT 10;
+-----------+--------------+------------+
| name | category | nb_review |
+-----------+--------------+------------+
| Rand | Restaurants | 1086 |
| J | Restaurants | 661 |
| Aileen | Restaurants | 499 |
| Michael | Restaurants | 496 |
+-----------+--------------+------------+
40
© 2016 MapR Technologies@tgrall 41
Events Capture?
© 2016 MapR Technologies 42
Events Capture
Customer DB
API
Logs
…
Streaming Streams
Files
© 2016 MapR Technologies 43
What is Spark Streaming?
• Enables scalable, high-throughput, fault-tolerant stream
processing of live data
• Extension of the core Spark
Data Sources Data Sinks
© 2016 MapR Technologies 44
Spark Streaming Architecture
• Divide data stream into batches of X seconds (micro batching)
• Called DStream = sequence of RDDs
Spark
Streaming
input data
stream
DStream RDD batches
Batch
interval
data from
time 0 to 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1
© 2016 MapR Technologies 45
What are Apache Kafka & MapR Streams?
• Publish Subscribe Messaging
• Fast
• Scalable
• Durable
• Distributed
© 2016 MapR Technologies@tgrall 46
Summary
© 2016 MapR Technologies 47
Lambda Architecture
NEW DATA
STREAM QUERY
BATCH VIEWS
√View 1 View 2 View N
REAL-TIME VIEWS
BATCH LAYER
SERVINGLAYER
SPEED LAYER
MERGE
IMMUTABLE
MASTER DATA
PRECOMPUTE
VIEWSBATCH
RECOMPUTE
PROCESS
STREAM
INCREMENT
VIEWS
View 1 View 2 View N
NoSQL
Distributed
File System
NoSQL
Streams
© 2016 MapR Technologies 48
Lambda Architecture in Action
Batch processing
(MapReduce)
Tax reduction
reporting
Shortest path graph
algorithm
(Titan on MapR-DB)
Route
optimization
.
.
.
Geolocation
Geolocation
Geolocation
Geolocation
Online alerts
Real-time stream
© 2016 MapR Technologies 49
Lambda Architecture
• Fault-tolerant
• Use batch layer to pre compute complex/large data set queries
• Use speed layer to deal with “near real time” use cases
• Linear scale-out capabilities
• Error Prone:
• Recompute data from master data set when needed
© 2016 MapR Technologies 50
© 2016 MapR Technologies 51
Q&A
@tgrall maprtech
tug@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

Contenu connexe

Tendances

Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsApache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsTimothy Spann
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleHelena Edelson
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Tugdual Grall
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with SparkKnoldus Inc.
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016 Hiromitsu Komatsu
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...Nathan Bijnens
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Brian O'Neill
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringAnant Rustagi
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark Summit
 
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Databricks
 
Streaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleStreaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleHelena Edelson
 
Kick-Start with SMACK Stack
Kick-Start with SMACK StackKick-Start with SMACK Stack
Kick-Start with SMACK StackKnoldus Inc.
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Helena Edelson
 
Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)E. Balauca
 

Tendances (20)

Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsApache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroring
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
 
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
 
Streaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleStreaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For Scale
 
Kick-Start with SMACK Stack
Kick-Start with SMACK StackKick-Start with SMACK Stack
Kick-Start with SMACK Stack
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
 
Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)
 

Similaire à Build Scalable and Reliable Apps with Lambda Architecture

HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistHUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistSpagoWorld
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillTomer Shiran
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
 
What and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual GrallWhat and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual Gralldistributed matters
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drilltshiran
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into ProductionMapR Technologies
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaData Science Thailand
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016Mathieu Dumoulin
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...DataWorks Summit/Hadoop Summit
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudKaran Singh
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Tugdual Grall
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 

Similaire à Build Scalable and Reliable Apps with Lambda Architecture (20)

HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistHUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
 
What and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual GrallWhat and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual Grall
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloud
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 

Plus de Tugdual Grall

Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkTugdual Grall
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkTugdual Grall
 
Introduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi WorkshopIntroduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi WorkshopTugdual Grall
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Tugdual Grall
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglotTugdual Grall
 
Drop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema DesignDrop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema DesignTugdual Grall
 
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6Tugdual Grall
 
Some cool features of MongoDB
Some cool features of MongoDBSome cool features of MongoDB
Some cool features of MongoDBTugdual Grall
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB ApplicationTugdual Grall
 
Opensourceday 2014-iot
Opensourceday 2014-iotOpensourceday 2014-iot
Opensourceday 2014-iotTugdual Grall
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseTugdual Grall
 
Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseTugdual Grall
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Tugdual Grall
 
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0Tugdual Grall
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLTugdual Grall
 
Big Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataBig Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataTugdual Grall
 
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0Tugdual Grall
 

Plus de Tugdual Grall (20)

Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Introduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi WorkshopIntroduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi Workshop
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications
 
MongoDB and Hadoop
MongoDB and HadoopMongoDB and Hadoop
MongoDB and Hadoop
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
 
Drop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema DesignDrop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema Design
 
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
 
Some cool features of MongoDB
Some cool features of MongoDBSome cool features of MongoDB
Some cool features of MongoDB
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB Application
 
Opensourceday 2014-iot
Opensourceday 2014-iotOpensourceday 2014-iot
Opensourceday 2014-iot
 
Neotys conference
Neotys conferenceNeotys conference
Neotys conference
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with Couchbase
 
Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with Couchbase
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?
 
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
 
Big Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataBig Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big Data
 
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
 

Dernier

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Dernier (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Build Scalable and Reliable Apps with Lambda Architecture

  • 1. © 2015 MapR Technologies ‹#›© 2016 MapR Technologies Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
  • 2. © 2016 MapR Technologies ‹#›@tgrall {“about” : “me”} Tugdual “Tug” Grall • MapR • Technical Evangelist • MongoDB • Technical Evangelist • Couchbase • Technical Evangelist • eXo • CTO • Oracle • Developer/Product Manager • Mainly Java/SOA • Developer in consulting firms • Web • @tgrall • http://tgrall.github.io • tgrall • NantesJUG co-founder • Pet Project : • http://www.resultri.com • tug@mapr.com • tugdual@gmail.com
  • 3. © 2016 MapR Technologies@tgrall 3 Big Data & Hadoop In Production
  • 4. © 2016 MapR Technologies 4 Data Warehouse Optimization
  • 5. © 2016 MapR Technologies 5 Data Hub Choose the best “connector”: • File • Sqoop • ETL • … Use the aggregated data • In your applications • To update other systems • as an Open Data API • … Customer DB Customer DB Logs … Hadoop NoSQL
  • 6. © 2016 MapR Technologies 6 Financial Services Fraud detection Personalized offers Fraud investigation tool Fraud investigator Fraud model Recommendations table Clickstream analysis Online transactions MapR Distribution for Hadoop Analytics Real-time Operational Applications Interactive marketer
  • 7. © 2016 MapR Technologies@tgrall 7 Fault Tolerance
  • 8. © 2016 MapR Technologies 8 Fault Tolerance hardware software developer ?
  • 9. © 2016 MapR Technologies 9 Human fault tolerance
  • 10. © 2014 MapR Technologies 10
  • 11. © 2014 MapR Technologies 11
  • 12. © 2014 MapR Technologies 12
  • 13. © 2016 MapR Technologies@tgrall 13 Lambda Architecture To the rescue λ
  • 14. © 2016 MapR Technologies 14 A little bit of history…. • Defined by Nathan Marz • ex BackType, Twitter • in a new Startup • Creator of … – Storm – Cascalog – ElephantDB
  • 15. © 2016 MapR Technologies 15 Lambda Architecture Requirements • Fault-tolerant against both hardware failures & human errors • Support variety of use cases that include low latency querying as well as updates • Linear scale-out capabilities • Extensible, so that the system is manageable and can accommodate newer features easily
  • 16. © 2016 MapR Technologies 16
  • 17. © 2016 MapR Technologies 17 Lambda Architecture NEW DATA STREAM QUERY BATCH VIEWS √View 1 View 2 View N REAL-TIME VIEWS BATCH LAYER SERVINGLAYER SPEED LAYER MERGE IMMUTABLE MASTER DATA PRECOMPUTE VIEWSBATCH RECOMPUTE PROCESS STREAM INCREMENT VIEWS View 1 View 2 View N
  • 18. © 2016 MapR Technologies 18 Data Ingestion All data entering the system are dispatched to both • the batch layer • the speed layer NEW DATA STREAM BATCH LAYER SPEED LAYER
  • 19. © 2016 MapR Technologies Batch Layer • managing the master dataset, an immutable, append-only set of raw data • pre-computing arbitrary query functions, called batch views. BATCH VIEWS BATCH LAYER IMMUTABLE MASTER DATA PRECOMPUTE VIEWSBATCH RECOMPUTE View 1 View 2 View N
  • 20. © 2016 MapR Technologies 20 Speed Layer √View 1 View 2 View N REAL-TIME VIEWS SPEED LAYER PROCESS STREAM INCREMENT VIEWS • Speed layer accommodates low latency requests that are subject to low latency requirements. • Using fast and incremental algorithms, deals with recent data only
  • 21. © 2016 MapR Technologies 21 Serving Layer QUERY BATCH VIEWS √View 1 View 2 View N REAL-TIME VIEWS SERVINGLAYER MERGE View 1 View 2 View N • Serving layer indexes batch views so that they can be queried in ad hoc with low latency
  • 22. © 2014 MapR Technologies 22 Lambda Architecture—Compensate Batch time not absorbed now
  • 23. © 2016 MapR Technologies 23 Lambda Architecture—Immutable Data + Views http://openflights.org
  • 24. © 2016 MapR Technologies 24 Lambda Architecture—Immutable Data + Views timestamp airport flight action 2016-02-04T10:00:00 MUC EY123 take-off 2016-02-04T10:05:00 BRU SAS45 take-off 2016-02-04T10:07:00 AMS BA99 take-off 2016-02-04T10:09:00 LHR LH17 landing 2016-02-04T10:10:00 CDG AF03 landing 2016-02-04T10:10:00 FCO AZ501 take-off immutable master dataset
  • 25. © 2016 MapR Technologies 25 Lambda Architecture—Immutable Data + Views timestamp airport flight action 2016-02-04T10:00:00 MUC EY123 take-off 2016-02-04T10:05:00 BRU SAS45 take-off 2016-02-04T10:07:00 AMS BA99 take-off 2016-02-04T10:09:00 LHR LH17 landing 2016-02-04T10:10:00 CDG AF03 landing 2016-02-04T10:10:00 FCO AZ501 take-off air-borne: 2307 airline planes AF 59 AZ 23 BA 167 EY 19 LH 201 SAS 28 air-borne per airline: airport planes AMS 69 CDG 44 BRU 31 FCO 10 HEL 17 LHR 101 airport load:
  • 26. © 2016 MapR Technologies@tgrall 26 Implementation
  • 27. © 2016 MapR Technologies 27 Lambda Architecture NEW DATA STREAM QUERY BATCH VIEWS √View 1 View 2 View N REAL-TIME VIEWS BATCH LAYER SERVINGLAYER SPEED LAYER MERGE IMMUTABLE MASTER DATA PRECOMPUTE VIEWSBATCH RECOMPUTE PROCESS STREAM INCREMENT VIEWS View 1 View 2 View N
  • 28. © 2016 MapR Technologies 28 Batch Layer: View Generation Master Data View 1 View 2 Master Data Master Data Master Data Events “Raw” Storage Processing Aggregated Data
  • 29. © 2016 MapR Technologies 29
  • 30. © 2016 MapR Technologies 30 • Cluster Computing Platform • Extends “MapReduce” with extensions – Streaming – Interactive Analytics • Run in Memory
  • 31. © 2015 MapR Technologies ‹#›@tgrall Spark components Spark SQL Spark Streaming (Streaming) MLlib (Machine Learning) Spark Core (General execution engine) GraphX (Graph Computation) Mesos Distributed File System (HDFS, MapR-FS, S3, …) Hadoop YARN
  • 32. © 2016 MapR Technologies 32 Spark Jobs Driver Program (application) sc=new SparkContext rDD=sc.textfile(“hdfs://…”) rDD.map Cluster Manager Worker Executor Task Task Worker Executor Task Task
  • 33. © 2016 MapR Technologies 33 Spark Resilient Distributed Datasets “RDD” Sensor RDD W Executor P4 W Executor P1 P3 W Executor P2 sc.textFile P1 8213034705, 95, 2.927373, jake7870, 0…… P2 8213034705, 115, 2.943484, Davidbresler2, 1…. P3 8213034705, 100, 2.951285, gladimacowgirl, 58… P4 8213034705, 117, 2.998947, daysrus, 95….
  • 34. © 2016 MapR Technologies 34 Spark Resilient Distributed Datasets Transformation Filter() Action Count() RDD newRDD Value
  • 35. © 2015 MapR Technologies@tgrall Transformations • Process an RDD, returns an RDD • Examples : • map() : one value => another value • mapToPair() : one value => a tuple • filter() : filters values/tuples on a given condition • groupByKey() : groups values by key • reduceByKey() : aggregates values by key • join(), cogroup(), … : joins RDDs
  • 36. © 2015 MapR Technologies@tgrall Actions • Process an RDD, returns a value • Examples : • count() : counts number of items in dataset • first() : returns first entry • take(n) : returns array of the n first elements • foreach() : applies a function on each element • collect() : returns all elements • saveAsTextFile() : saves in files each element
  • 37. © 2016 MapR Technologies 37 Speed Layer Real Time View1 Real Time View 2 Events Processing NoSQL
  • 38. © 2016 MapR Technologies 38 Serving Layer: Aggregated Data • Views are stored in a Read/Write database • Apache HBase • MapR DB Binary & JSON • Cassandra • MongoDB • Elasticsearch • …
  • 39. © 2016 MapR Technologies 39 Serving Layer Real Time View Events Processing Aggregated Batch View Query-SQL Dataviz Query/Visualisation SQL
  • 40. © 2016 MapR Technologies // Join MapR-DB Table, Parquet and MongoDB collection > SELECT u.name, b.category, count(1) nb_review FROM mongo.yelp.`user` u , dfs.yelp.`review.parquet` r, (select business_id, flatten(categories) category from maprdb.`business` ) b WHERE u.user_id = r.user_id AND b.business_id = r.business_id GROUP BY u.user_id, u.name, b.category ORDER BY nb_review DESC LIMIT 10; +-----------+--------------+------------+ | name | category | nb_review | +-----------+--------------+------------+ | Rand | Restaurants | 1086 | | J | Restaurants | 661 | | Aileen | Restaurants | 499 | | Michael | Restaurants | 496 | +-----------+--------------+------------+ 40
  • 41. © 2016 MapR Technologies@tgrall 41 Events Capture?
  • 42. © 2016 MapR Technologies 42 Events Capture Customer DB API Logs … Streaming Streams Files
  • 43. © 2016 MapR Technologies 43 What is Spark Streaming? • Enables scalable, high-throughput, fault-tolerant stream processing of live data • Extension of the core Spark Data Sources Data Sinks
  • 44. © 2016 MapR Technologies 44 Spark Streaming Architecture • Divide data stream into batches of X seconds (micro batching) • Called DStream = sequence of RDDs Spark Streaming input data stream DStream RDD batches Batch interval data from time 0 to 1 data from time 1 to 2 RDD @ time 2 data from time 2 to 3 RDD @ time 3RDD @ time 1
  • 45. © 2016 MapR Technologies 45 What are Apache Kafka & MapR Streams? • Publish Subscribe Messaging • Fast • Scalable • Durable • Distributed
  • 46. © 2016 MapR Technologies@tgrall 46 Summary
  • 47. © 2016 MapR Technologies 47 Lambda Architecture NEW DATA STREAM QUERY BATCH VIEWS √View 1 View 2 View N REAL-TIME VIEWS BATCH LAYER SERVINGLAYER SPEED LAYER MERGE IMMUTABLE MASTER DATA PRECOMPUTE VIEWSBATCH RECOMPUTE PROCESS STREAM INCREMENT VIEWS View 1 View 2 View N NoSQL Distributed File System NoSQL Streams
  • 48. © 2016 MapR Technologies 48 Lambda Architecture in Action Batch processing (MapReduce) Tax reduction reporting Shortest path graph algorithm (Titan on MapR-DB) Route optimization . . . Geolocation Geolocation Geolocation Geolocation Online alerts Real-time stream
  • 49. © 2016 MapR Technologies 49 Lambda Architecture • Fault-tolerant • Use batch layer to pre compute complex/large data set queries • Use speed layer to deal with “near real time” use cases • Linear scale-out capabilities • Error Prone: • Recompute data from master data set when needed
  • 50. © 2016 MapR Technologies 50
  • 51. © 2016 MapR Technologies 51 Q&A @tgrall maprtech tug@mapr.com Engage with us! MapR maprtech mapr-technologies

Notes de l'éditeur

  1. First of all since we will be talking about Big Data Applications… let’s see some use case that are very common… High level
  2. New apps, big data or not, must be “fault tolerant”… and the lambda arch has been build for that.. and at which level…
  3. Hardware, Commodity hardware : we know that it willl fail So we compensate for that using software : HDFS/MapR-FS, Hbase/MaprDB, Zookeeper, .. you have infrastructure to support failure What about the developer? human being becoming the weakest link
  4. So infrastrcuture using Hadoop/MapR/Distributed software is Fault Tolerant but we still need to deal with HUMAN ERROR…. since some of us are making mistake the goal is to “recover from it” WE ALL DO MISTAKE… look at these big names
  5. Facebook apologises after crash: Social network site went down for the third time in a month due to a 'configuration issue'
  6. Storm is realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Cascalog Fully-featured data processing and querying library for Clojure or Java. ElephantDB Distributed database specialized in exporting key/value data from Hadoop
  7. As you can guess, in application development when we talk about architecture it is all about LAYERS
  8. So we can see that in this case the application generate “EVENTS” everything we do generate events: Credit Card Payiment, Commit toGit, WebPage Click, twet, …. The event are used to manipulate the “data”, but we can use the events as the main data
  9. The event you generate are immutable they have “happened”, and they are time based
  10. Data Locality
  11. Resilient distributed datasets, or RDD, are the primary abstraction in Spark. They are a collection of objects that is distributed across nodes in a cluster, and data operations are performed on RDD. Once created, RDD are immutable. You can also persist, or cache, RDDs in memory or on disk. Spark RDDs are fault-tolerant. If a given node or task fails, the RDD can be reconstructed automatically on the remaining nodes and the job will complete.
  12. There are two types of data operations you can perform on an RDD, transformations and actions.   A transformation will return an RDD. Since RDD are immutable, the transformation will return a new RDD.   An action will return a value.
  13. ● Socket ● Kafka ● Flume ● HDFS ● MQ (ZeroMQ...) ● Twitter ● ... ● Or a custom implementation of Receiver
  14. Store all events as raw data Create Intermediate Views Errors are fixed using re-computation Based on Scalable and Reliable Storage Distributed File System Optimized formats (Parquet, Avro, Protobuff, …) NoSQL Engines HBase, MapR-DB, Elasticsearch, Cassandra, MongoDB, … Distributed Processing Spark Drill (SQL)