Stream Processing in 2019

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Chris Horder – AWS Solutions Architect
3rd July 2019
Stream processing in 2019

Typical credit card transaction
Credit Card
Institution
User
Merchant Mobile
client

Mobile Apps Web Clickstream Application Logs
Metering Records IoT Sensors Smart Cities
[Wed Oct 11 14:32:52
2018] [error] [client
127.0.0.1] client
denied by server
configuration:
/export/home/live/ap/h
tdocs/test
Data is produced continuously

The diminishing value of data over time
Historical
Reactive
Actionable
Preventive/
Predictive
Informationin
Decision-Making
Time-critical
Decisions
Traditional“Batch”
BusinessIntelligence
Months…DaysHoursMinutesSecondsRealTime
ValueofDatatoDecision-Making

Customer examples
1 billion events per
week from
connected devices
Near-real-time
home valuation
(Zestimates)
Live clickstream
dashboards refreshed
under 10 sec.
50 billion daily ad
impressions, sub-
50-ms responses
Facilitate
communications
between 100+
microservices
Analyse billions of
network flows in
real time
IoT predictive
analytics
Analyse game
events in near
real time

What do we need from our streaming architecture
Event driven
Source for
batch
Consumption models
Schema flexibility
Microservices
Loosely coupled
Elasticity
Horizontal
scalability
Operations

Components of a streaming architecture
Producer Message Buffer
Topic A
Topic B
Consumer
Producer
Producer
Producer
Producer
Producer
Schema
Repository
Consumer
Consumer
Consumer
Consumer

Checkpoint: Amazon SQS, Amazon Kinesis and
Amazon Managed Streaming for Kafka
• Traditional Messaging
semantics
• Transparent scaling
• Individual message
delay
• Dead letter queues
• Multiple Consumers
• Native AWS
Integrations
• Fully Managed
• Control over ordering
• Highly configurable
retention
• Managed Kafka and
Zookeeper
• Existing applications
• Full configurability
• Log compaction
Amazon Simple
Queue Service
Amazon
Kinesis
Amazon Managed
Streaming for Kafka
Kafka on Elastic Compute
Cloud (EC2)

Producers: AWS native
AWS Database Migration
Service
Kinesis Producer Library
Kinesis Agent
AWS IoT Amazon Connect
Amazon Pinpoint
Amazon DynamoDB
Streams
AWS Tools and SDKs
Amazon API Gateway
Amazon EMR

Real-time
Fully managed
Scalable
Secure
Cost effective
Amazon EMR/Spark
Custom code on
Amazon EC2
Amazon S3
Amazon
Redshift
Splunk
Ingest,
store data
streams
Amazon
Kinesis Data
Streams
Amazon
Kinesis Data
Analytics
Aggregate,
filter,
enrich data
Amazon
Kinesis Data
Firehose
Egress
data
streams
AWS Lambda
Amazon
Elasticsearch
Service
Amazon Kinesis Data Streaming
Collect Process and analyse data streams in real-time

Apache Kafka anatomy 101
Producer
Broker
Broker
Broker
Consumer
Cluster
Zookeeper
Producer

Consumers
Kinesis Consumer Library
Kinesis Agent
AWS Lambda
Amazon EMR Amazon
Kinesis Data
Analytics
Amazon
Kinesis Data
Firehose
SDK’s

Apache Flink
Framework and distributed engine for stateful processing of data streams.
Simple programming
High
performance
Stateful Processing Strong data integrity
Easy to use and flexible
APIs make building
apps fast
In-memory computing
provides low latency &
high throughput
Durable application
state saves
Exactly-once processing
and consistent state

Getting started
Amazon Managed
Streaming for Kafka
Amazon Elasticsearch
Service
{
"accountId": "11185",
"transactionId": 1155776,
"isDetailAvailable": "true",
"type": ”PAYMENT",
"status": ”POSTED",
"description": "Credit card payment to Sample Shop",
"postingDateTime": 1553207924271,
"valueDateTime": "22-03-2019 09:38:44",
"executionDateTime": 1553207924271,
"amount": 767.8318,
"currency": "AUD",
"reference": "AKG62DHBB5",
"merchantName": ”Sample shop : Purchase",
"merchantCategoryCode": "4112"
}`

Getting started: Infrastructure
VPC
AWS Cloud
Availability Zone 1 Availability Zone 2
Instance
Broker
Instance
Broker
JMeter
Node Node
Availability Zone 3
Instance
Broker
Node
Amazon
ECS for
Kubernetes
Amazon
Managed
Streaming for
Kafka

Sample data
{
"accountId": "11158",
"transactionId": “126649”,
"isDetailAvailable": true,
"type": "PAYMENT",
"status": "POSTED",
"description": "Credit card payment to AccountY",
"postingDateTime": "09-04-2019 17:17:44",
"valueDateTime": "09-04-2019 17:17:44",
"executionDateTime": "09-04-2019 17:17:44",
"amount": 534.24,
"currency": ”AUD",
"reference": "BF6GCJ2EDE",
"merchantName": ”MerchantA",
"merchantCategoryCode": "3150"
}

S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SQL over the stream
Table result1 =
tableEnv.sqlQuery("select transactionId," +
" amount, " +
" accountId, " +
" currency, " +
" type " +
"from Transactions " +
" where currency = 'BAD' and amount > 995 and type = 'TRANSFER_OUTGOING'"
);
• Familiar syntax
• Joins across streams
• Aggregations

Aggregations and windowing
Sliding
select TUMBLE_START(created_date, INTERVAL '20' SECOND) as wStart,
TUMBLE_END(created_date, INTERVAL '20' SECOND) as wEnd,
SUM(amount),
AVG(amount),
COUNT(*)
from Transactions
GROUP BY TUMBLE(created_date, INTERVAL '20' SECOND)
Tumbling
Session

Categorise transactions from a stream
Amazon Managed
Streaming for Kafka
Mobile
client
Users
Amazon Simple
Notification Service
Amazon API GatewayAmazon DynamoDB
Amazon SageMaker Amazon Managed
Streaming for Kafka

Grow with your requirements
Amazon Managed
Streaming for Kafka
Mobile
client
Amazon API Gateway
Amazon Simple Storage
Service (S3)
Amazon Athena
Amazon RedshiftAmazon Elasticsearch
Service
DesktopAmazon RDS
AWS
Lambda
Amazon Neptune
Amazon EC2
Analyst

Data61 - CSIRO
• Standards developed as part of the introduction in Australia of the Consumer Data
Right legislation to give Australians greater control over their data
• The Consumer Data Right is intended to apply sector by sector across the whole
economy, beginning in the banking, energy and telecommunications sectors.
• https://consumerdatastandardsaustralia.github.io

Resources
• Free online training - https://www.aws.training
• Flink Homepage - https://flink.apache.org/
• AWS Big Data Blog
• https://aws.amazon.com/blogs/big-data/
• Real-time bushfire alerting with Complex Event Processing in Apache Flink on Amazon EMR and IoT
sensor network
• Confluent - https://www.confluent.io/blog/
• Designing Data-Intensive Applications by Martin Kleppmann

Stream Processing in 2019

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Stream Processing in 2019

Similaire à Stream Processing in 2019 (20)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

Stream Processing in 2019