SlideShare une entreprise Scribd logo
1  sur  52
@tgrall#Devoxx #sparkstreaming
Build a Time Series Application
with Spark and HBase
Tugdual Grall
@tgrall
MapR
Carol Mac Donald
@caroljmcdonald
MapR
@tgrall#Devoxx #sparkstreaming
Agenda
• Time Series
• Apache Spark & Spark Streaming
• Apache HBase
• Lab
@tgrall#Devoxx #sparkstreaming
About the Lab
• Use Spark & HBase in MapR Cluster
• Option 1: Use a SandBox (Virtual Box VM located on USB
Key)
• Option 2: Use Cloud Instance (SSH/SCP only)
• Content:
• Option 1: spark-streaming-hbase-workshop.zip on USB
• Option 2: download zip from
https://github.com/tgrall/spark-streaming-hbase-workshop
@tgrall#Devoxx #sparkstreaming
Time Series
@tgrall#Devoxx #sparkstreaming
What is a Time Series?
• Stuff with timestamps
• sensor measurements
• system stats
• log files
• ….
@tgrall#Devoxx #sparkstreaming
Got Some Examples?
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
What do we need to do?
• Acquire
• Measurement, transmission, reception
• Store
• Individually, or grouped for some amount of time
• Retrieve
• Ad hoc, flexible, correlate and aggregate
• Analyze and visualize
• We facilitate this via retrieval
@tgrall#Devoxx #sparkstreaming
Acquisition
Not usually our problem
• Sensors
• Data collection – agents, raspberry pi
• Transmission – via LAN/Wan, Mobile Network, Satellites
• Receipt into system – listening daemon or queue, or
depending on use case writing directly to the database
@tgrall#Devoxx #sparkstreaming
Storage Choice
• Flat files
• Great for rapid ingest with massive data
• Handles essentially any data type
• Less good for data requiring frequent updates
• Harder to find specific ranges
• Traditional RDBMS
• Ingests up to ~10,000/ sec; prefers well structured (numerical) data;
expensive
• NoSQL (such as MapR-DB or HBase)
• Easily handle 10,000 rows / sec / node – True linear scaling
• Handles wide variety of data
• Good for frequent updates
• Easily scanned in a range
@tgrall#Devoxx #sparkstreaming
Specific Example
Consider oil drilling rigs
• When drilling wells, there are *lots* of moving parts
• Typically a drilling rig makes about 10K samples/s
• Temperatures, pressures, magnetics, machine vibration
levels, salinity, voltage, currents, many others
• Typical project has 100 rigs
@tgrall#Devoxx #sparkstreaming
General Outline
10K samples / second / rig
x 100 rigs
= 1M samples / second
• But wait, there’s more
• Suppose you want to test your system
• Perhaps with a year of data
• And you want to load that data in << 1 year
• 100x real-time = 100M samples / second
@tgrall#Devoxx #sparkstreaming
Data Storage
• Typical time window is one hour
• Column names are offsets in time window
• Find series-uid in separate table
Key 13 43 73 103 …
…
series-uid.time-window 4.5 5.2 6.1 4.9
…
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
Why do we need NoSQL / HBase?
Relational Model
bottleneck
Key colB colC
val val val
xxx val val
Key colB colC
val val val
xxx val val
Key colB colC
val val val
xxx val val
Storage ModelRDBMS HBase
Distributed Joins, Transactions do
not scale
Data that is accessed together is
stored together
@tgrall#Devoxx #sparkstreaming
HBase is a ColumnFamily oriented Database
• Data is accessed and stored together:
• RowKey is the primary index
• Column Families group similar data by row key
CF_DATA
colA colB colC
Val val
val
CF_STATS
colA colB colC
val val
val
RowKey
series-abc.time-
window
series-efg.time-
window
Customer id Raw Data Stats
@tgrall#Devoxx #sparkstreaming
HBase is a Distributed Database
Key
Range
xxxx
xxxx
CF1
colA colB colC
val val
val
CF2
colA colB colC
val val
val
Key
Range
xxxx
xxxx
CF1
colA colB colC
val val
val
CF2
colA colB colC
val val
val
Key
Range
xxxx
xxxx
CF1
colA colB colC
val val
val
CF2
colA colB colC
val val
val
Put, Get by Key
Data is automatically
distributed across the cluster
• Key range is used for horizontal
partitioning
@tgrall#Devoxx #sparkstreaming
Basic Table Operations
• Create Table, define Column Families before data is
imported
• but not the rows keys or number/names of columns
• Low level API, technically more demanding
• Basic data access operations (CRUD):
put Inserts data into rows (both create and update)
get Accesses data from one row
scan Accesses data from a range of rows
delete Delete a row or a range of rows or columns
@tgrall#Devoxx #sparkstreaming
Learn More
• Free Online Training: http://learn.mapr.com
• DEV 320 - Apache HBase Data Model and Architecture
• DEV 325 - Apache HBase Schema Design
• DEV 330 - Developing Apache HBase Applications: Basics
• DEV 335 - Developing Apache HBase Applications: Advanced
@tgrall#Devoxx #sparkstreaming
@tgrall#Devoxx #sparkstreaming
What is Spark?
• Cluster Computing Platform
• Extends “MapReduce” with
extensions
• Streaming
• Interactive Analytics
• Run in Memory
@tgrall#Devoxx #sparkstreaming
What is Spark?
Fast
• 100x faster than M/R
Logistic regression in Hadoop and Spark
@tgrall#Devoxx #sparkstreaming
What is Spark?
Ease of Development
• Write programs quickly
• More Operators
• Interactive Shell
• Less Code
@tgrall#Devoxx #sparkstreaming
What is Spark?
Multi Language Support
• Scala
• Python
• Java
• SparkR
@tgrall#Devoxx #sparkstreaming
What is Spark?
Deployment Flexibility
• Deployment
• Local
• Standalone
• Storage
• HDFS
• MapR-FS
• S3
• Cassandra
• YARN
• Mesos
@tgrall#Devoxx #sparkstreaming
Unified Platform
Spark SQL
Spark Streaming
(Streaming)
MLlib
(Machine Learning)
Spark Core (General execution engine)
GraphX
(Graph Computation)
@tgrall#Devoxx #sparkstreaming
Spark Components
Driver Program
(application)
SparkContext
Cluster Manager
Worker
Executor
Task Task
Worker
Executor
Task Task
@tgrall#Devoxx #sparkstreaming
Spark Resilient Distributed Datasets
Sensor RDD
W
Executor
P4
W
Executor
P1 P3
W
Executor
P2
sc.textFile P1
8213034705, 95,
2.927373,
jake7870, 0……
P2
8213034705,
115, 2.943484,
Davidbresler2,
1….
P3
8213034705,
100, 2.951285,
gladimacowgirl,
58…
P4
8213034705,
117, 2.998947,
daysrus, 95….
@tgrall#Devoxx #sparkstreaming
Spark Resilient Distributed Datasets
Transformation
Filter()
Action
Count()
RDD
newRDD
Value
@tgrall#Devoxx #sparkstreaming
Spark Streaming
Spark SQL
Spark Streaming
(Streaming)
MLlib
(Machine Learning)
Spark Core (General execution engine)
GraphX
(Graph Computation)
@tgrall#Devoxx #sparkstreaming
What is Streaming?
• Data Stream:
• Unbounded sequence of data arriving continuously
• Stream processing:
• Low latency processing, querying, and analyzing of real time
streaming data
@tgrall#Devoxx #sparkstreaming
Why Spark Streaming
• Many applications must process
streaming data
• With the following Requirements:
• Results in near-real-time
• Handle large workloads
• latencies of few seconds
• Use Cases
• Website statistics, monitoring
• IoT
• Fraud detection
• Social network trends
• Advertising click monetization
put
put
put
put
Time stamped data
data
• Sensor, System Metrics, Events, log files
• Stock Ticker, User Activity
• Hi Volume, Velocity
Data for real-time
monitoring
@tgrall#Devoxx #sparkstreaming
What is Spark Streaming?
• Enables scalable, high-throughput, fault-tolerant stream
processing of live data
• Extension of the core Spark
Data Sources Data Sinks
@tgrall#Devoxx #sparkstreaming
Spark Streaming Architecture
• Divide data stream into batches of X seconds
• Called DStream = sequence of RDDs
Spark
Streaming
input data
stream
DStream RDD batches
Batch
interval
data from
time 0 to 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1
@tgrall#Devoxx #sparkstreaming
Process DStream
• Process using transformations
• creates new RDDs
transform
Transform
map
reduceByValue
count
DStream
RDDs
Dstream
RDDs
transformtransform
data from
time 0 to 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1
RDD @ time 1 RDD @ time 2 RDD @ time 3
@tgrall#Devoxx #sparkstreaming
Time Series
Data for
real-time monitoring
read
Sensor
Time stamped data
HBase
Processing
data
@tgrall#Devoxx #sparkstreaming
Lab “flow”
@tgrall#Devoxx #sparkstreaming
Convert Line of CSV data to Sensor
Object
case class Sensor(resid: String, date: String, time: String,
hz: Double, disp: Double, flo: Double, sedPPM: Double,
psi: Double, chlPPM: Double)
def parseSensor(str: String): Sensor = {
val p = str.split(",")
Sensor(p(0), p(1), p(2), p(3).toDouble, p(4).toDouble, p(5).toDouble,
p(6).toDouble, p(7).toDouble, p(8).toDouble)
}
@tgrall#Devoxx #sparkstreaming
Create a DStream
val ssc = new StreamingContext(sparkConf, Seconds(2))
val linesDStream = ssc.textFileStream(“/mapr/stream")
batch
time 0-1
linesDStream
batch
time 1-2
batch
time 1-2
DStream: a sequence of RDDs representing a
stream of data
stored in memory as an
RDD
@tgrall#Devoxx #sparkstreaming
Process DStream
val linesDStream = ssc.textFileStream(”directory path")
val sensorDStream = linesDStream.map(parseSensor)
map
new RDDs created for
every batch
batch
time 0-1
linesDStream RDDs
sensorDstream RDDs
batch
time 1-2
mapmap
batch
time 1-2
@tgrall#Devoxx #sparkstreaming
Save to HBase
rdd.map(Sensor.convertToPut).saveAsHadoopDataset(jobConfig)
Put objects written
To HBase
batch
time 0-1
linesRDD DStream
sensorRDD Dstream
batch
time 1-2
map
batch
time 1-2
HBase
save save save
output operation: persist data to external storage
map map
@tgrall#Devoxx #sparkstreaming
Go !
@tgrall#Devoxx #sparkstreaming
Cloud Access
• user01 … user49 password : mapr
Host/IP User ID
> 54.177.24.77 userX1 | userX6
> 54.193.135.223 userX2 | userX7
> 54.177.75.37 userX3 | userX8
> 54.177.36.88 userX4 | userX9
> 50.18.18.129 userX5 | userX0

Contenu connexe

En vedette

Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Spark Summit
 

En vedette (20)

Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
 
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
 
Spark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til Piffl
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Time Series Analysis with Spark
Time Series Analysis with SparkTime Series Analysis with Spark
Time Series Analysis with Spark
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew Ray
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for Training
 
Spark + HBase
Spark + HBase Spark + HBase
Spark + HBase
 
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
 
Time Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy RyzaTime Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy Ryza
 

Plus de Tugdual Grall

Opensourceday 2014-iot
Opensourceday 2014-iotOpensourceday 2014-iot
Opensourceday 2014-iot
Tugdual Grall
 

Plus de Tugdual Grall (20)

Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015
 
Introduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi WorkshopIntroduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi Workshop
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications
 
MongoDB and Hadoop
MongoDB and HadoopMongoDB and Hadoop
MongoDB and Hadoop
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
 
Drop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema DesignDrop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema Design
 
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
 
Some cool features of MongoDB
Some cool features of MongoDBSome cool features of MongoDB
Some cool features of MongoDB
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB Application
 
Opensourceday 2014-iot
Opensourceday 2014-iotOpensourceday 2014-iot
Opensourceday 2014-iot
 
Neotys conference
Neotys conferenceNeotys conference
Neotys conference
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with Couchbase
 
Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with Couchbase
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?
 
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Devoxx-2015 Hands-On - Time Series with Spark & HBase

Notes de l'éditeur

  1. Resilient distributed datasets, or RDD, are the primary abstraction in Spark. They are a collection of objects that is distributed across nodes in a cluster, and data operations are performed on RDD. Once created, RDD are immutable. You can also persist, or cache, RDDs in memory or on disk. Spark RDDs are fault-tolerant. If a given node or task fails, the RDD can be reconstructed automatically on the remaining nodes and the job will complete.
  2. There are two types of data operations you can perform on an RDD, transformations and actions.   A transformation will return an RDD. Since RDD are immutable, the transformation will return a new RDD.   An action will return a value.