SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
SERHAT CAN • @SRHTCN
AWS Kinesis
•
Table of Contents
Streaming data?
Big Data Processing Approaches
AWS Kinesis Family
Amazon Kinesis Streams in detail
Amazon Kinesis Firehose
Amazon Kinesis Analytics
Streaming Data: Life As It Happens
After the event occurs -> at rest (batch)
As the event occurs -> in motion (streaming)
Big Data Processing Approaches
• Common Big Data Processing Approaches
• Query Engine Approach (Data Warehouse, SQL, NoSQL Databases)
• Repeated queries over the same well-structured data
• Pre-computations like indices and dimensional views improve query performance
• Batch Engines (Map-Reduce)
• The “query” is run on the data. There are no pre-computations
• Streaming Big Data Processing Approach
• Real-time response to content in semi-structured data streams
• Relatively simple computations on data (aggregates, filters, sliding window, etc.)
• Enables data lifecycle by moving data to different stores / open source systems
Kinesis Family
Amazon Kinesis Streams
• A fully managed service for real-time processing of
high- volume, streaming data.
• Kinesis can store and process terabytes of data an
hour from hundreds of thousands of sources.
• Data is replicated across multiple Availability Zones
to ensure high durability and availability.
Amazon Kinesis Streams Concepts
Shard
• Streams are made of Shards. A shard is the base
throughput unit of an Amazon Kinesis stream.
• One shard provides a capacity of 1MB/sec data input
and 2MB/sec data output.
• One shard can support up to 1000 PUT records per
second.
• You can monitor shard-level metrics in Amazon Kinesis
Streams
• Add or remove shards from your stream dynamically
as your data throughput changes by resharding the
stream.
Data Record
• A record is the unit of data stored in an Amazon Kinesis stream.
• A record is composed of a;
• partition key
• sequence number,
• data blob (the data you want to send)
• The maximum size of a data blob (the data payload after Base64-
decoding) is 1 megabyte (MB).
Partition Key
• Partition key is used to segregate and route data records to different
shards of a stream.
• A partition key is specified by your data producer while putting data
into an Amazon Kinesis stream.
• For example, assuming you have an Amazon Kinesis stream with two
shards (Shard 1 and Shard 2). You can configure your data producer
to use two partition keys (Key A and Key B) so that all data records
with Key A are added to Shard 1 and all data records with Key B are
added to Shard 2.
Sequence Number
• Each data record has a sequence number that is unique within its
shard.
• The sequence number is assigned by Streams after you write to the
stream with client.putRecords or client.putRecord.
• Sequence numbers for the same partition key generally increase over
time; the longer the time period between write requests, the larger the
sequence numbers become.
Resharding the Stream
• Streams supports resharding, which enables you to adjust the number of
shards in your stream in order to adapt to changes in the rate of data flow
through the stream.
• There are two types of resharding operations: shard split and shard
merge.
• Shard split: divide a single shard into two shards.
• Shard merge: combine two shards into a single shard.
Resharding the Stream
• Resharding is always “pairwise”: split into & merge more than two shards
in a single operation is NOT allowed
• Resharding is typically performed by an administrative application which
is distinct from the producer (put) applications, and the consumer (get)
applications
• The administrative application would also need a broader set of IAM
permissions for resharding
Splitting a Shard
• Specify how hash key values from the parent shard should be redistributed to the child shards
• The possible hash key values for a given shard constitute a set of ordered contiguous non-
negative integers. This range of possible hash key values is given by
shard.getHashKeyRange().getStartingHashKey();
shard.getHashKeyRange().getEndingHashKey();
• When you split the shard, you specify a value in this range.
• That hash key value and all higher hash key values are distributed to one of the child shards.
• All the lower hash key values are distributed to the other child shard.
Merging Two Shards
• In order to merge two shards, the shards must be adjacent.
• Two shards are considered adjacent if the union of the hash key ranges
for the two shards form a contiguous set with no gaps.
• To identify shards that are candidates for merging, you should filter out all
shards that are in a CLOSED state.
• Shards that are OPEN—that is, not CLOSED—have an ending sequence
number of null.
After Resharding
• After you call a resharding operation, either splitShard or mergeShards,
you need to wait for the stream to become active again. (like create)
• In the process of resharding, a parent shard transitions from an OPEN
state to a CLOSED state to an EXPIRED state.
• When all is done back to ACTIVE state.
Retention Period
• Data records are accessible for a default of 24 hours from the
time they are added to a stream
• Configurable in hourly increments
• From 24 to 168 hours (1 to 7 days)
Amazon Kinesis Producer Library (KPL)
• The KPL is an easy-to-use, highly configurable library that helps you
write to a Amazon Kinesis stream.
• Writes to one or more Amazon Kinesis streams with an automatic and configurable
retry mechanism
• Collects records and uses PutRecords to write multiple records to multiple shards
per request
• Aggregates user records to increase payload size and improve throughput
• Integrates seamlessly with the Amazon Kinesis Client Library (KCL) to de-aggregate
batched records on the consumer
• Submits Amazon CloudWatch metrics on your behalf to provide visibility into
producer performance
• Develop a consumer application for Amazon Kinesis Streams
• The KCL acts as an intermediary between your record processing logic and
Streams.
• KCL application instantiates a worker with configuration information, and then
uses a record processor to process the data received from an Amazon Kinesis
stream.
• You can run a KCL application on any number of instances. Multiple instances
of the same application coordinate on failures and load-balance dynamically.
• You can also have multiple KCL applications working on the same stream,
subject to throughput limits.
Amazon Kinesis Client Library (Life Saver)
Amazon Kinesis Client Library
• Connects to the stream
• Enumerates the shards
• Coordinates shard associations with other workers (if any)
• Instantiates a record processor for every shard it manages
• Pulls data records from the stream
• Pushes the records to the corresponding record processor
• Checkpoints processed records
• Balances shard-worker associations when the worker instance count changes
• Balances shard-worker associations when shards are split or merged
Amazon Kinesis Client Library
• KCL uses a unique Amazon DynamoDB table to keep
track of the application's state
• KCL creates the table with a provisioned throughput of
10 reads per second and 10 writes per second
• Each row in the DynamoDB table represents a shard that
is being processed by your application. The hash key for
the table is the shard ID.
Amazon Kinesis Client Library
• In addition to the shard ID, each row also includes the following data:
• checkpoint: The most recent checkpoint sequence number for the shard. This value is unique across
all shards in the stream.
• checkpointSubSequenceNumber: When using the Kinesis Producer Library's aggregation feature,
this is an extension to checkpoint that tracks individual user records within the Amazon Kinesis record.
• leaseCounter: Used for lease versioning so that workers can detect that their lease has been taken by
another worker.
• leaseKey: A unique identifier for a lease. Each lease is particular to a shard in the stream and is held
by one worker at a time.
• leaseOwner: The worker that is holding this lease.
• ownerSwitchesSinceCheckpoint: How many times this lease has changed workers since the last
time a checkpoint was written.
• parentShardId: Used to ensure that the parent shard is fully processed before processing starts on
the child shards. This ensures that records are processed in the same order they were put into the
stream.
Using Shard Iterators
• You retrieve records from the stream on a per-
shard basis. 

• AT_SEQUENCE_NUMBER
• AFTER_SEQUENCE_NUMBER
• AT_TIMESTAMP
• TRIM_HORIZON
• LATEST
Recovering from Failures
• Record Processor Failure
• The worker invokes record processor methods using Java ExecutorService tasks.
• If a task fails, the worker retains control of the shard that the record processor was
processing.
• The worker starts a new record processor task to process that shard
• Worker or Application Failure
• If a worker — or an instance of the Amazon Kinesis Streams application — fails,
you should detect and handle the situation.
Handling Duplicate Records
(Idempotency)
• There are two primary reasons why records may be
delivered more than one time to your Amazon
Kinesis Streams application:
• producer retries
• consumer retries
• Your application must anticipate and appropriately
handle processing individual records multiple times.
Pricing
• Shard Hour (1MB/second ingress, 2MB/second egress)$0.015
• PUT Payload Units, per 1,000,000 units $0.014
• Extended Data Retention (Up to 7 days), per Shard Hour $0.020
• DynamoDB price if you use KCL
Kafka vs. Kinesis Streams
• In Kafka you can configure, for each topic, the replication factor and how many replicas
have to acknowledge a message before is considered successful.So you can definitely
make it highly available.
• Amazon ensures that you won't lose data, but that comes with a performance cost. 

(messages are written to 3 different AZ’s synchronously)
• There are several benchmarks online comparing Kafka and Kinesis, but the result it's
always the same: you'll have a hard time to replicate Kafka's performance in Kinesis. At
least for a reasonable price.
• This is in part is because Kafka is insanely fast, but also because Kinesis writes each
message synchronously to 3 different machines. And this is quite costly in terms of
latency and throughput.
• Kafka is one of the preferred options for the Apache stream processing frameworks
• Unsurprisingly, Kinesis is really well integrated with other AWS services
DynamoDB Streams vs. Kinesis Streams
• DynamoDB Streams actions are similar to their
counterparts in Amazon Kinesis Streams, they
are not 100% identical.
• You can write applications for Amazon Kinesis
Streams using the Amazon Kinesis Client Library
(KCL).
• You can leverage the design patterns found
within the KCL to process DynamoDB Streams
shards and stream records. To do this, you use
the DynamoDB Streams Kinesis Adapter
SQS vs. Kinesis Streams
• Amazon Kinesis Streams enables real-time
processing of streaming big data.
• It provides ordering of records, as well as the
ability to read and/or replay records in the same
order to multiple Amazon Kinesis Applications.
• The Amazon Kinesis Client Library (KCL)
delivers all records for a given partition key to
the same record processor, making it easier to
build multiple applications reading from the same
Amazon Kinesis stream (for example, to perform
counting, aggregation, and filtering).
• Amazon Simple Queue Service (Amazon SQS)
offers a reliable, highly scalable hosted queue
for storing messages as they travel between
computers.
• Amazon SQS lets you easily move data between
distributed application components and helps
you build applications in which messages are
processed independently (with message-level
ack/fail semantics), such as automated
workflows.
Amazon Kinesis Firehose
Amazon Kinesis Firehose
• Amazon Kinesis Firehose is the easiest way to load streaming data into AWS.
• It can capture, transform, and load streaming data into Amazon Kinesis
Analytics, Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service
• Fully managed service that automatically scales to match the throughput of
your data and requires no ongoing administration.
• It can also batch, compress, and encrypt the data before loading it,
minimizing the amount of storage used at the destination and increasing
security.
Amazon Kinesis Analytics
• Process streaming data in real time with standard SQL
• Query streaming data or build entire streaming applications using SQL, so
that you can gain actionable insights and respond to your business and
customer needs promptly.
• Scales automatically to match the volume and throughput rate of your
incoming data
• Only pay for the resources your queries consume. There is no minimum fee
or setup cost.
Amazon Kinesis Analytics
Step 1: Configure Input Stream Step 2: Write your SQL queries Step 3: Configure Output Stream
Thank you!
Time to show you real life examples
from OpsGenie

Contenu connexe

Tendances

AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)Amazon Web Services
 
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...Amazon Web Services
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesm vaishnavi
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaAmazon Web Services
 
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...Amazon Web Services
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Amazon Web Services
 
Deep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch ServiceDeep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch ServiceAmazon Web Services
 
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Amazon Web Services
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsAmazon Web Services
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAmazon Web Services
 
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon KinesisSRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon KinesisAmazon Web Services
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
 
Real time sentiment analysis using twitter stream api & aws kinesis
Real time sentiment analysis using twitter stream api & aws kinesisReal time sentiment analysis using twitter stream api & aws kinesis
Real time sentiment analysis using twitter stream api & aws kinesisArmando Padilla
 
Real-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxReal-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxAmazon Web Services
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Amazon Web Services
 

Tendances (20)

AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
 
AWS Kinesis Streams
AWS Kinesis StreamsAWS Kinesis Streams
AWS Kinesis Streams
 
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics services
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & Kibana
 
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3
 
Deep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch ServiceDeep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch Service
 
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
 
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon KinesisSRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Real time sentiment analysis using twitter stream api & aws kinesis
Real time sentiment analysis using twitter stream api & aws kinesisReal time sentiment analysis using twitter stream api & aws kinesis
Real time sentiment analysis using twitter stream api & aws kinesis
 
Real-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxReal-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with Beeswax
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 

Similaire à AWS Kinesis - Streams, Firehose, Analytics

(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...Amazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteGigaom
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Amazon Web Services
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...Amazon Web Services
 
AWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual MeetupAWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual MeetupAnahit Pogosova
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Amazon Web Services
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceAmazon Web Services
 

Similaire à AWS Kinesis - Streams, Firehose, Analytics (20)

(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
AWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual MeetupAWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual Meetup
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Bigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_appBigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_app
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
 

Dernier

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 

Dernier (20)

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 

AWS Kinesis - Streams, Firehose, Analytics

  • 1. SERHAT CAN • @SRHTCN AWS Kinesis •
  • 2.
  • 3. Table of Contents Streaming data? Big Data Processing Approaches AWS Kinesis Family Amazon Kinesis Streams in detail Amazon Kinesis Firehose Amazon Kinesis Analytics
  • 4. Streaming Data: Life As It Happens After the event occurs -> at rest (batch) As the event occurs -> in motion (streaming)
  • 5. Big Data Processing Approaches • Common Big Data Processing Approaches • Query Engine Approach (Data Warehouse, SQL, NoSQL Databases) • Repeated queries over the same well-structured data • Pre-computations like indices and dimensional views improve query performance • Batch Engines (Map-Reduce) • The “query” is run on the data. There are no pre-computations • Streaming Big Data Processing Approach • Real-time response to content in semi-structured data streams • Relatively simple computations on data (aggregates, filters, sliding window, etc.) • Enables data lifecycle by moving data to different stores / open source systems
  • 7. Amazon Kinesis Streams • A fully managed service for real-time processing of high- volume, streaming data. • Kinesis can store and process terabytes of data an hour from hundreds of thousands of sources. • Data is replicated across multiple Availability Zones to ensure high durability and availability.
  • 9. Shard • Streams are made of Shards. A shard is the base throughput unit of an Amazon Kinesis stream. • One shard provides a capacity of 1MB/sec data input and 2MB/sec data output. • One shard can support up to 1000 PUT records per second. • You can monitor shard-level metrics in Amazon Kinesis Streams • Add or remove shards from your stream dynamically as your data throughput changes by resharding the stream.
  • 10. Data Record • A record is the unit of data stored in an Amazon Kinesis stream. • A record is composed of a; • partition key • sequence number, • data blob (the data you want to send) • The maximum size of a data blob (the data payload after Base64- decoding) is 1 megabyte (MB).
  • 11. Partition Key • Partition key is used to segregate and route data records to different shards of a stream. • A partition key is specified by your data producer while putting data into an Amazon Kinesis stream. • For example, assuming you have an Amazon Kinesis stream with two shards (Shard 1 and Shard 2). You can configure your data producer to use two partition keys (Key A and Key B) so that all data records with Key A are added to Shard 1 and all data records with Key B are added to Shard 2.
  • 12. Sequence Number • Each data record has a sequence number that is unique within its shard. • The sequence number is assigned by Streams after you write to the stream with client.putRecords or client.putRecord. • Sequence numbers for the same partition key generally increase over time; the longer the time period between write requests, the larger the sequence numbers become.
  • 13. Resharding the Stream • Streams supports resharding, which enables you to adjust the number of shards in your stream in order to adapt to changes in the rate of data flow through the stream. • There are two types of resharding operations: shard split and shard merge. • Shard split: divide a single shard into two shards. • Shard merge: combine two shards into a single shard.
  • 14. Resharding the Stream • Resharding is always “pairwise”: split into & merge more than two shards in a single operation is NOT allowed • Resharding is typically performed by an administrative application which is distinct from the producer (put) applications, and the consumer (get) applications • The administrative application would also need a broader set of IAM permissions for resharding
  • 15. Splitting a Shard • Specify how hash key values from the parent shard should be redistributed to the child shards • The possible hash key values for a given shard constitute a set of ordered contiguous non- negative integers. This range of possible hash key values is given by shard.getHashKeyRange().getStartingHashKey(); shard.getHashKeyRange().getEndingHashKey(); • When you split the shard, you specify a value in this range. • That hash key value and all higher hash key values are distributed to one of the child shards. • All the lower hash key values are distributed to the other child shard.
  • 16. Merging Two Shards • In order to merge two shards, the shards must be adjacent. • Two shards are considered adjacent if the union of the hash key ranges for the two shards form a contiguous set with no gaps. • To identify shards that are candidates for merging, you should filter out all shards that are in a CLOSED state. • Shards that are OPEN—that is, not CLOSED—have an ending sequence number of null.
  • 17. After Resharding • After you call a resharding operation, either splitShard or mergeShards, you need to wait for the stream to become active again. (like create) • In the process of resharding, a parent shard transitions from an OPEN state to a CLOSED state to an EXPIRED state. • When all is done back to ACTIVE state.
  • 18. Retention Period • Data records are accessible for a default of 24 hours from the time they are added to a stream • Configurable in hourly increments • From 24 to 168 hours (1 to 7 days)
  • 19. Amazon Kinesis Producer Library (KPL) • The KPL is an easy-to-use, highly configurable library that helps you write to a Amazon Kinesis stream. • Writes to one or more Amazon Kinesis streams with an automatic and configurable retry mechanism • Collects records and uses PutRecords to write multiple records to multiple shards per request • Aggregates user records to increase payload size and improve throughput • Integrates seamlessly with the Amazon Kinesis Client Library (KCL) to de-aggregate batched records on the consumer • Submits Amazon CloudWatch metrics on your behalf to provide visibility into producer performance
  • 20. • Develop a consumer application for Amazon Kinesis Streams • The KCL acts as an intermediary between your record processing logic and Streams. • KCL application instantiates a worker with configuration information, and then uses a record processor to process the data received from an Amazon Kinesis stream. • You can run a KCL application on any number of instances. Multiple instances of the same application coordinate on failures and load-balance dynamically. • You can also have multiple KCL applications working on the same stream, subject to throughput limits. Amazon Kinesis Client Library (Life Saver)
  • 21. Amazon Kinesis Client Library • Connects to the stream • Enumerates the shards • Coordinates shard associations with other workers (if any) • Instantiates a record processor for every shard it manages • Pulls data records from the stream • Pushes the records to the corresponding record processor • Checkpoints processed records • Balances shard-worker associations when the worker instance count changes • Balances shard-worker associations when shards are split or merged
  • 22. Amazon Kinesis Client Library • KCL uses a unique Amazon DynamoDB table to keep track of the application's state • KCL creates the table with a provisioned throughput of 10 reads per second and 10 writes per second • Each row in the DynamoDB table represents a shard that is being processed by your application. The hash key for the table is the shard ID.
  • 23. Amazon Kinesis Client Library • In addition to the shard ID, each row also includes the following data: • checkpoint: The most recent checkpoint sequence number for the shard. This value is unique across all shards in the stream. • checkpointSubSequenceNumber: When using the Kinesis Producer Library's aggregation feature, this is an extension to checkpoint that tracks individual user records within the Amazon Kinesis record. • leaseCounter: Used for lease versioning so that workers can detect that their lease has been taken by another worker. • leaseKey: A unique identifier for a lease. Each lease is particular to a shard in the stream and is held by one worker at a time. • leaseOwner: The worker that is holding this lease. • ownerSwitchesSinceCheckpoint: How many times this lease has changed workers since the last time a checkpoint was written. • parentShardId: Used to ensure that the parent shard is fully processed before processing starts on the child shards. This ensures that records are processed in the same order they were put into the stream.
  • 24. Using Shard Iterators • You retrieve records from the stream on a per- shard basis. 
 • AT_SEQUENCE_NUMBER • AFTER_SEQUENCE_NUMBER • AT_TIMESTAMP • TRIM_HORIZON • LATEST
  • 25. Recovering from Failures • Record Processor Failure • The worker invokes record processor methods using Java ExecutorService tasks. • If a task fails, the worker retains control of the shard that the record processor was processing. • The worker starts a new record processor task to process that shard • Worker or Application Failure • If a worker — or an instance of the Amazon Kinesis Streams application — fails, you should detect and handle the situation.
  • 26. Handling Duplicate Records (Idempotency) • There are two primary reasons why records may be delivered more than one time to your Amazon Kinesis Streams application: • producer retries • consumer retries • Your application must anticipate and appropriately handle processing individual records multiple times.
  • 27. Pricing • Shard Hour (1MB/second ingress, 2MB/second egress)$0.015 • PUT Payload Units, per 1,000,000 units $0.014 • Extended Data Retention (Up to 7 days), per Shard Hour $0.020 • DynamoDB price if you use KCL
  • 28. Kafka vs. Kinesis Streams • In Kafka you can configure, for each topic, the replication factor and how many replicas have to acknowledge a message before is considered successful.So you can definitely make it highly available. • Amazon ensures that you won't lose data, but that comes with a performance cost. 
 (messages are written to 3 different AZ’s synchronously) • There are several benchmarks online comparing Kafka and Kinesis, but the result it's always the same: you'll have a hard time to replicate Kafka's performance in Kinesis. At least for a reasonable price. • This is in part is because Kafka is insanely fast, but also because Kinesis writes each message synchronously to 3 different machines. And this is quite costly in terms of latency and throughput. • Kafka is one of the preferred options for the Apache stream processing frameworks • Unsurprisingly, Kinesis is really well integrated with other AWS services
  • 29. DynamoDB Streams vs. Kinesis Streams • DynamoDB Streams actions are similar to their counterparts in Amazon Kinesis Streams, they are not 100% identical. • You can write applications for Amazon Kinesis Streams using the Amazon Kinesis Client Library (KCL). • You can leverage the design patterns found within the KCL to process DynamoDB Streams shards and stream records. To do this, you use the DynamoDB Streams Kinesis Adapter
  • 30. SQS vs. Kinesis Streams • Amazon Kinesis Streams enables real-time processing of streaming big data. • It provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications. • The Amazon Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications reading from the same Amazon Kinesis stream (for example, to perform counting, aggregation, and filtering). • Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable hosted queue for storing messages as they travel between computers. • Amazon SQS lets you easily move data between distributed application components and helps you build applications in which messages are processed independently (with message-level ack/fail semantics), such as automated workflows.
  • 32. Amazon Kinesis Firehose • Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. • It can capture, transform, and load streaming data into Amazon Kinesis Analytics, Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service • Fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. • It can also batch, compress, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.
  • 33. Amazon Kinesis Analytics • Process streaming data in real time with standard SQL • Query streaming data or build entire streaming applications using SQL, so that you can gain actionable insights and respond to your business and customer needs promptly. • Scales automatically to match the volume and throughput rate of your incoming data • Only pay for the resources your queries consume. There is no minimum fee or setup cost.
  • 34. Amazon Kinesis Analytics Step 1: Configure Input Stream Step 2: Write your SQL queries Step 3: Configure Output Stream
  • 35. Thank you! Time to show you real life examples from OpsGenie