SlideShare une entreprise Scribd logo
1  sur  49
November 10th, 2015
Herndon, VA
Real-Time Event Processing
Agenda Overview
8:30 AM Welcome
8:45 AM Big Data in the Cloud
10:00 AM Break
10:15 AM Data Collection and Storage
11:30 AM Break
11:45 AM Real-time Event Processing
1:00 PM Lunch
1:30 PM HPC in the Cloud
2:45 PM Break
3:00 PM Processing and Analytics
4:15 PM Close
Primitive Patterns
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
Real-Time Event Processing
• Event-driven programming
• Trigger action based on real-time input
Examples:
 Proactively detect errors in logs and devices
 Identify abnormal activity
 Monitor performance SLAs
 Notify when SLAs/performance drops below a threshold
Two main processing patterns
• Stream processing (real time)
 Real-time response to events in data streams
 Relatively simple data computations (aggregates, filters, sliding window)
• Micro-batching (near real time)
 Near real-time operations on small batches of events in data streams
 Standard processing and query engine for analysis
You’re likely already “streaming”
• Sensor networks analytics
• Network analytics
• Log shipping and centralization
• Click stream analysis
• Gaming status
• Hardware and software appliance metrics
• …more…
• Any proxy layer B organizing and passing data from A to C
 A to B to C
Amazon Kinesis
Kinesis Stream
• Streams are made of Shards
• Each Shard ingests data up to 1MB/sec,
and up to 1000 TPS
• Each Shard emits up to 2 MB/sec
• All data is stored for 24 hours
• Scale Kinesis streams by splitting or
merging Shards
• Replay data inside of 24Hr. Window
• Extensible to up to 7 days
Kinesis Stream & Shards
Amazon Kinesis
Why Stream Storage?
• Decouple producers &
consumers
• Temporary buffer
• Preserve client ordering
• Streaming MapReduce
4 4 3 3 2 2 1 14 3 2 1
4 3 2 1
4 3 2 1
4 3 2 1
4 4 3 3 2 2 1 1
Producer 1
Shard or Partition 1
Shard or Partition 2
Consumer 1
Count of
Red = 4
Count of
Violet = 4
Consumer 2
Count of
Blue = 4
Count of
Green = 4
Producer 2
Producer 3
Producer N
Key = Red
Key = Green
How to Size your Kinesis Stream - Ingress
Suppose 2 Producers, each producing 2KB records at 500 KB/s:
Minimum Requirement: Ingress Capacity of 2 MB/s, Egress Capacity of 2MB/s
A theoretical minimum of 2 shards is required which will provide an ingress capacity of
2MB/s, and egress capacity 4 MB/s
Shard
Shard
1 MB/S
2 KB * 500 TPS = 1000KB/s
1 MB/S
2 KB * 500 TPS = 1000KB/s
Payment Processing Application
Producers
Theoretical Minimum of 2 Shards Required
How to Size your Kinesis Stream - Egress
Records are durably stored in Kinesis for 24 hours, allowing for multiple consuming
applications to process the data
Let’s extend the same example to have 3 consuming applications:
If all applications are reading at the ingress rate of 1MB/s per shard, an aggregate read capacity of 6
MB/s is required, exceeding the shard’s egress limit of 4MB/s
Solution: Simple! Add another shard to the stream to spread the load
Shard
Shard
1 MB/S
2 KB * 500 TPS = 1000KB/s
1 MB/S
2 KB * 500 TPS = 1000KB/s
Payment Processing Application
Fraud Detection Application
Recommendation Engine Application
Egress Bottleneck
Producers
• Producers use PutRecord or PutRecords call
to store data in a Stream.
• PutRecord {Data, StreamName, PartitionKey}
• A Partition Key is supplied by producer and used to
distribute the PUTs across Shards
• Kinesis MD5 hashes supplied partition key over the
hash key range of a Shard
• A unique Sequence # is returned to the Producer
upon a successful call
Putting Data into Kinesis
Simple Put interface to store data in Kinesis
Real-time event processing frameworks
Kinesis
Client
Library
AWS Lambda
Use Case: IoT Sensors
Remotely determine what a device
senses.
Lambda
KCL
Archive
Correlation
Analysis
Mobile
DeviceVarious
Sensors
HTTPS
Small
Thing
IoT Sensors - Trickles become a Stream
Persistent
Stream
A B C
Amazon
IoT
MQTT
Use Case: Trending Top Activity
Ad Network Logging Top-10 Detail - KCL
Global top-10
Elastic Beanstalk
foo-analysis.com
Data
Records Shard
KCL
Workers
My top-10
Kinesis
Stream
Kinesis
Application
Data Record
Shard:
Sequence Number
14 17 18 21 23
Ad Network Logging Top-10 Detail - Lambda
Global top-10
Elastic Beanstalk
foo-analysis.com
Data
Records Shard
Lambda
Workers
My top-10
Data Record
Shard:
Sequence Number
14 17 18 21 23
Kinesis
Stream
Amazon KCL
Kinesis Client Library
Kinesis Client Library (KCL)
• Distributed to handle multiple
shards
• Fault tolerant
• Elastically adjust to shard
count
• Helps with distributed
processing
• Develop in Java, Python, Ruby,
Node.js, .NET
KCL Design Components
• Worker:- Processing unit that maps to each application instance
• Record processor:- The processing unit that processes data from
a shard of a Kinesis stream
• Check-pointer: Keeps track of the records that have already
been processed in a given shard
• KCL restarts the processing of the shard at the last known
processed record if a worker fails
KCL restarts the processing of the shard at the
last known processed record if a worker fails
Processing with Kinesis Client Library
• Connects to the stream and enumerates the shards
• Instantiates a record processor for every shard managed
• Checkpoints processed records in Amazon DynamoDB
• Balances shard-worker associations when the worker
instance count changes
• Balances shard-worker associations when shards are
split or merged
Best practices for KCL applications
• Leverage EC2 Auto Scaling Groups for your KCL Application
• Move Data from an Amazon Kinesis stream to S3 for long-term persistence
 Use either Firehose or Build an “archiver” consumer application
• Leverage durable storage like DynamoDB or S3 for processed data prior to
check-pointing
• Duplicates: Ensure the authoritative data repository is resilient to
duplicates
• Idempotent processing: Build a deterministic/repeatable system that can
achieve idempotence processing through check-pointing
Amazon Kinesis connector application
Connector Pipeline
Transformed
Filtered
Buffered
Emitted
Incoming
Records
Outgoing to
Endpoints
Amazon Kinesis Connector
• Amazon S3
 Batch Write Files for Archive into S3
 Sequence Based File Naming
• Amazon Redshift
 Micro-batching load to Redshift with manifest support
 User Defined message transformers
• Amazon DynamoDB
 BatchPut append to table
 User defined message transformers
• Elasticsearch
 Put to Elasticsearch cluster
 User defined message transforms
S3 Dynamo DB Redshift
Kinesis
AWS Lambda
Event-Driven Compute in the Cloud
• Lambda functions: Stateless, request-driven code
execution
 Triggered by events in other services:
• PUT to an Amazon S3 bucket
• Write to an Amazon DynamoDB table
• Record in an Amazon Kinesis stream
 Makes it easy to…
• Transform data as it reaches the cloud
• Perform data-driven auditing, analysis, and notification
• Kick off workflows
No Infrastructure to Manage
• Focus on business logic, not infrastructure
• Upload your code; AWS Lambda handles:
 Capacity
 Scaling
 Deployment
 Monitoring
 Logging
 Web service front end
 Security patching
Automatic Scaling
• Lambda scales to match the event rate
• Don’t worry about over or under
provisioning
• Pay only for what you use
• New app or successful app, Lambda
matches your scale
Bring your own code
• Create threads and processes, run batch scripts or
other executables, and read/write files in /tmp
• Include any library with your Lambda
function code, even native libraries.
Fine-grained pricing
• Buy compute time in 100ms
increments
• Low request charge
• No hourly, daily, or monthly
minimums
• No per-device fees
Free Tier
1M requests and 400,000 GB-s of compute
Every month, every customer
Never pay
for idle
Data Triggers: Amazon S3
Amazon S3 Bucket Events AWS Lambda
Satelitte image Thumbnail of Satellite image
1
2
3
Data Triggers: Amazon DynamoDB
AWS LambdaAmazon DynamoDB
Table and Stream
Send Amazon SNS
notifications
Update another
table
Calling Lambda Functions
• Call from mobile or web apps
 Wait for a response or send an event and continue
 AWS SDK, AWS Mobile SDK, REST API, CLI
• Send events from Amazon S3 or SNS:
 One event per Lambda invocation, 3 attempts
• Process DynamoDB changes or Amazon Kinesis
records as events:
 Ordered model with multiple records per event
 Unlimited retries (until data expires)
Writing Lambda Functions
• The Basics
 Stock Node.js, Java, Python
 AWS SDK comes built in and ready to use
 Lambda handles inbound traffic
• Stateless
 Use S3, DynamoDB, or other Internet storage for persistent data
 Don’t expect affinity to the infrastructure (you can’t “log in to the box”)
• Familiar
 Use processes, threads, /tmp, sockets, …
 Bring your own libraries, even native ones
How can you use these features?
“I want to send
customized
messages to
different users”
SNS + Lambda
“I want to send
an offer when a
user runs out of
lives in my game”
Amazon Cognito +
Lambda + SNS
“I want to
transform the
records in a click
stream or an IoT
data stream”
Amazon Kinesis +
Lambda
Stream Processing
Apache Spark
Apache Storm
Amazon EMR
Amazon EMR integration
Read Data Directly into Hive, Pig,
Streaming and Cascading
• Real time sources into Batch
Oriented Systems
• Multi-Application Support
and Check-pointing
CREATE TABLE call_data_records (
start_time bigint,
end_time bigint,
phone_number STRING,
carrier STRING,
recorded_duration bigint,
calculated_duration bigint,
lat double,
long double
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ","
STORED BY
'com.amazon.emr.kinesis.hive.KinesisStorageHandler'
TBLPROPERTIES("kinesis.stream.name"=”MyTestStream");
Amazon EMR integration: Hive
Processing Amazon Kinesis streams
Amazon
Kinesis
EMR with
Spark Streaming
• Higher level abstraction called Discretized
Streams of DStreams
• Represented as a sequence of Resilient
Distributed Datasets (RDDs)
DStream
RDD@T1 RDD@T2
Messages
Receiver
Spark Streaming – Basic concepts
http://spark.apache.org/docs/latest/streaming-kinesis-integration.html
Apache Spark Streaming
• Window based transformations
 countByWindow, countByValueAndWindow etc.
• Scalability
 Partition input stream
 Each receiver can be run on separate worker
• Fault tolerance
 Write Ahead Log (WAL) support for Streaming
 Stateful exactly-once semantics
• Flexibility of running what you want
• EC2/etc – wordsmith here
Apache Storm
• Guaranteed data processing
• Horizontal scalability
• Fault-tolerance
• Integration with queuing system
• Higher level abstractions
Apache Storm: Basic Concepts
• Streams: Unbounded sequence of tuples
• Spout: Source of Stream
• Bolts :Processes input streams and output new streams
• Topologies :Network of spouts and bolts
https://github.com/awslabs/kinesis-storm-spout
Launches
Workers
Worker
ProcessesMaster Node
Cluster
Coordination
Storm architecture
Worker
Nimbus
Zookeeper
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor
Worker
Worker
Worker
Real-time: Event-based processing
Kinesis
Storm
Spout
Producer
Amazon
Kinesis
Apache Storm
ElastiCache
(Redis) Node.js Client
(D3)
http://blogs.aws.amazon.com/bigdata/post/Tx36LYSCY2R0A9B/Implement-a-
Real-time-Sliding-Window-Application-Using-Amazon-Kinesis-and-Apache
You’re likely already “streaming”
• Embrace “stream thinking”
• Event Processing tools are available
that will help increase your solutions’
functionality, availability and
durability
Thank You
Questions?

Contenu connexe

Tendances

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 

Tendances (20)

Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservices
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
Allyourbase
AllyourbaseAllyourbase
Allyourbase
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on Kubernetes
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 

Similaire à Real-Time Event Processing

Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Amazon Web Services
 

Similaire à Real-Time Event Processing (20)

Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS Lambda
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
 
SMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaSMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS Lambda
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
AWS Real-Time Event Processing
AWS Real-Time Event ProcessingAWS Real-Time Event Processing
AWS Real-Time Event Processing
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
Processamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de usoProcessamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de uso
 
A Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS LambdaA Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS Lambda
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Dernier

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Real-Time Event Processing

  • 1. November 10th, 2015 Herndon, VA Real-Time Event Processing
  • 2. Agenda Overview 8:30 AM Welcome 8:45 AM Big Data in the Cloud 10:00 AM Break 10:15 AM Data Collection and Storage 11:30 AM Break 11:45 AM Real-time Event Processing 1:00 PM Lunch 1:30 PM HPC in the Cloud 2:45 PM Break 3:00 PM Processing and Analytics 4:15 PM Close
  • 3. Primitive Patterns Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 4. Real-Time Event Processing • Event-driven programming • Trigger action based on real-time input Examples:  Proactively detect errors in logs and devices  Identify abnormal activity  Monitor performance SLAs  Notify when SLAs/performance drops below a threshold
  • 5. Two main processing patterns • Stream processing (real time)  Real-time response to events in data streams  Relatively simple data computations (aggregates, filters, sliding window) • Micro-batching (near real time)  Near real-time operations on small batches of events in data streams  Standard processing and query engine for analysis
  • 6. You’re likely already “streaming” • Sensor networks analytics • Network analytics • Log shipping and centralization • Click stream analysis • Gaming status • Hardware and software appliance metrics • …more… • Any proxy layer B organizing and passing data from A to C  A to B to C
  • 8. • Streams are made of Shards • Each Shard ingests data up to 1MB/sec, and up to 1000 TPS • Each Shard emits up to 2 MB/sec • All data is stored for 24 hours • Scale Kinesis streams by splitting or merging Shards • Replay data inside of 24Hr. Window • Extensible to up to 7 days Kinesis Stream & Shards
  • 9. Amazon Kinesis Why Stream Storage? • Decouple producers & consumers • Temporary buffer • Preserve client ordering • Streaming MapReduce 4 4 3 3 2 2 1 14 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 4 3 3 2 2 1 1 Producer 1 Shard or Partition 1 Shard or Partition 2 Consumer 1 Count of Red = 4 Count of Violet = 4 Consumer 2 Count of Blue = 4 Count of Green = 4 Producer 2 Producer 3 Producer N Key = Red Key = Green
  • 10. How to Size your Kinesis Stream - Ingress Suppose 2 Producers, each producing 2KB records at 500 KB/s: Minimum Requirement: Ingress Capacity of 2 MB/s, Egress Capacity of 2MB/s A theoretical minimum of 2 shards is required which will provide an ingress capacity of 2MB/s, and egress capacity 4 MB/s Shard Shard 1 MB/S 2 KB * 500 TPS = 1000KB/s 1 MB/S 2 KB * 500 TPS = 1000KB/s Payment Processing Application Producers Theoretical Minimum of 2 Shards Required
  • 11. How to Size your Kinesis Stream - Egress Records are durably stored in Kinesis for 24 hours, allowing for multiple consuming applications to process the data Let’s extend the same example to have 3 consuming applications: If all applications are reading at the ingress rate of 1MB/s per shard, an aggregate read capacity of 6 MB/s is required, exceeding the shard’s egress limit of 4MB/s Solution: Simple! Add another shard to the stream to spread the load Shard Shard 1 MB/S 2 KB * 500 TPS = 1000KB/s 1 MB/S 2 KB * 500 TPS = 1000KB/s Payment Processing Application Fraud Detection Application Recommendation Engine Application Egress Bottleneck Producers
  • 12. • Producers use PutRecord or PutRecords call to store data in a Stream. • PutRecord {Data, StreamName, PartitionKey} • A Partition Key is supplied by producer and used to distribute the PUTs across Shards • Kinesis MD5 hashes supplied partition key over the hash key range of a Shard • A unique Sequence # is returned to the Producer upon a successful call Putting Data into Kinesis Simple Put interface to store data in Kinesis
  • 13. Real-time event processing frameworks Kinesis Client Library AWS Lambda
  • 14. Use Case: IoT Sensors Remotely determine what a device senses.
  • 15. Lambda KCL Archive Correlation Analysis Mobile DeviceVarious Sensors HTTPS Small Thing IoT Sensors - Trickles become a Stream Persistent Stream A B C Amazon IoT MQTT
  • 16. Use Case: Trending Top Activity
  • 17. Ad Network Logging Top-10 Detail - KCL Global top-10 Elastic Beanstalk foo-analysis.com Data Records Shard KCL Workers My top-10 Kinesis Stream Kinesis Application Data Record Shard: Sequence Number 14 17 18 21 23
  • 18. Ad Network Logging Top-10 Detail - Lambda Global top-10 Elastic Beanstalk foo-analysis.com Data Records Shard Lambda Workers My top-10 Data Record Shard: Sequence Number 14 17 18 21 23 Kinesis Stream
  • 20. Kinesis Client Library (KCL) • Distributed to handle multiple shards • Fault tolerant • Elastically adjust to shard count • Helps with distributed processing • Develop in Java, Python, Ruby, Node.js, .NET
  • 21. KCL Design Components • Worker:- Processing unit that maps to each application instance • Record processor:- The processing unit that processes data from a shard of a Kinesis stream • Check-pointer: Keeps track of the records that have already been processed in a given shard • KCL restarts the processing of the shard at the last known processed record if a worker fails KCL restarts the processing of the shard at the last known processed record if a worker fails
  • 22. Processing with Kinesis Client Library • Connects to the stream and enumerates the shards • Instantiates a record processor for every shard managed • Checkpoints processed records in Amazon DynamoDB • Balances shard-worker associations when the worker instance count changes • Balances shard-worker associations when shards are split or merged
  • 23. Best practices for KCL applications • Leverage EC2 Auto Scaling Groups for your KCL Application • Move Data from an Amazon Kinesis stream to S3 for long-term persistence  Use either Firehose or Build an “archiver” consumer application • Leverage durable storage like DynamoDB or S3 for processed data prior to check-pointing • Duplicates: Ensure the authoritative data repository is resilient to duplicates • Idempotent processing: Build a deterministic/repeatable system that can achieve idempotence processing through check-pointing
  • 24. Amazon Kinesis connector application Connector Pipeline Transformed Filtered Buffered Emitted Incoming Records Outgoing to Endpoints
  • 25. Amazon Kinesis Connector • Amazon S3  Batch Write Files for Archive into S3  Sequence Based File Naming • Amazon Redshift  Micro-batching load to Redshift with manifest support  User Defined message transformers • Amazon DynamoDB  BatchPut append to table  User defined message transformers • Elasticsearch  Put to Elasticsearch cluster  User defined message transforms S3 Dynamo DB Redshift Kinesis
  • 27. Event-Driven Compute in the Cloud • Lambda functions: Stateless, request-driven code execution  Triggered by events in other services: • PUT to an Amazon S3 bucket • Write to an Amazon DynamoDB table • Record in an Amazon Kinesis stream  Makes it easy to… • Transform data as it reaches the cloud • Perform data-driven auditing, analysis, and notification • Kick off workflows
  • 28. No Infrastructure to Manage • Focus on business logic, not infrastructure • Upload your code; AWS Lambda handles:  Capacity  Scaling  Deployment  Monitoring  Logging  Web service front end  Security patching
  • 29. Automatic Scaling • Lambda scales to match the event rate • Don’t worry about over or under provisioning • Pay only for what you use • New app or successful app, Lambda matches your scale
  • 30. Bring your own code • Create threads and processes, run batch scripts or other executables, and read/write files in /tmp • Include any library with your Lambda function code, even native libraries.
  • 31. Fine-grained pricing • Buy compute time in 100ms increments • Low request charge • No hourly, daily, or monthly minimums • No per-device fees Free Tier 1M requests and 400,000 GB-s of compute Every month, every customer Never pay for idle
  • 32. Data Triggers: Amazon S3 Amazon S3 Bucket Events AWS Lambda Satelitte image Thumbnail of Satellite image 1 2 3
  • 33. Data Triggers: Amazon DynamoDB AWS LambdaAmazon DynamoDB Table and Stream Send Amazon SNS notifications Update another table
  • 34. Calling Lambda Functions • Call from mobile or web apps  Wait for a response or send an event and continue  AWS SDK, AWS Mobile SDK, REST API, CLI • Send events from Amazon S3 or SNS:  One event per Lambda invocation, 3 attempts • Process DynamoDB changes or Amazon Kinesis records as events:  Ordered model with multiple records per event  Unlimited retries (until data expires)
  • 35. Writing Lambda Functions • The Basics  Stock Node.js, Java, Python  AWS SDK comes built in and ready to use  Lambda handles inbound traffic • Stateless  Use S3, DynamoDB, or other Internet storage for persistent data  Don’t expect affinity to the infrastructure (you can’t “log in to the box”) • Familiar  Use processes, threads, /tmp, sockets, …  Bring your own libraries, even native ones
  • 36. How can you use these features? “I want to send customized messages to different users” SNS + Lambda “I want to send an offer when a user runs out of lives in my game” Amazon Cognito + Lambda + SNS “I want to transform the records in a click stream or an IoT data stream” Amazon Kinesis + Lambda
  • 38. Amazon EMR integration Read Data Directly into Hive, Pig, Streaming and Cascading • Real time sources into Batch Oriented Systems • Multi-Application Support and Check-pointing
  • 39. CREATE TABLE call_data_records ( start_time bigint, end_time bigint, phone_number STRING, carrier STRING, recorded_duration bigint, calculated_duration bigint, lat double, long double ) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," STORED BY 'com.amazon.emr.kinesis.hive.KinesisStorageHandler' TBLPROPERTIES("kinesis.stream.name"=”MyTestStream"); Amazon EMR integration: Hive
  • 40. Processing Amazon Kinesis streams Amazon Kinesis EMR with Spark Streaming
  • 41. • Higher level abstraction called Discretized Streams of DStreams • Represented as a sequence of Resilient Distributed Datasets (RDDs) DStream RDD@T1 RDD@T2 Messages Receiver Spark Streaming – Basic concepts http://spark.apache.org/docs/latest/streaming-kinesis-integration.html
  • 42. Apache Spark Streaming • Window based transformations  countByWindow, countByValueAndWindow etc. • Scalability  Partition input stream  Each receiver can be run on separate worker • Fault tolerance  Write Ahead Log (WAL) support for Streaming  Stateful exactly-once semantics
  • 43. • Flexibility of running what you want • EC2/etc – wordsmith here
  • 44. Apache Storm • Guaranteed data processing • Horizontal scalability • Fault-tolerance • Integration with queuing system • Higher level abstractions
  • 45. Apache Storm: Basic Concepts • Streams: Unbounded sequence of tuples • Spout: Source of Stream • Bolts :Processes input streams and output new streams • Topologies :Network of spouts and bolts https://github.com/awslabs/kinesis-storm-spout
  • 47. Real-time: Event-based processing Kinesis Storm Spout Producer Amazon Kinesis Apache Storm ElastiCache (Redis) Node.js Client (D3) http://blogs.aws.amazon.com/bigdata/post/Tx36LYSCY2R0A9B/Implement-a- Real-time-Sliding-Window-Application-Using-Amazon-Kinesis-and-Apache
  • 48. You’re likely already “streaming” • Embrace “stream thinking” • Event Processing tools are available that will help increase your solutions’ functionality, availability and durability