SlideShare une entreprise Scribd logo
1  sur  21
Fernando Rodriguez Olivera
@frodriguez
Buenos Aires, Argentina, Dec 2015
Amazon Kinesis
AWS User Group Argentina
Twitter: @frodriguez
Professor at Universidad Austral (Distributed Systems, Compiler
Design, Operating Systems, …)
Creator of mvnrepository.com
Organizer at Buenos Aires High Scalability Group
Fernando Rodriguez Olivera
Amazon Kinesis Streams
High-throughput, low-latency service
for real-time data processing over large,
distributed data streams
Kinesis Streams
...
...
Producers
Kinesis
Stream
data retention
between 24 to 168 hrs
App #1
App #2
designed for < 1 sec
latency
Shards
...
...
Producers
Kinesis Stream
App #1
App #2
Shard 1
Shard 2
PK9PK9
PK7PK1 PK1
KinesisEndpoints
Shard 3
PK3PK6
Records annotated with same Partition Key (PK) are stored in the same shard
Shard Capacity
New
Records
Get
Records
24h Retention
Max 86.4GB
168h Retention
Max 604.8GB
1 MB/s
1K put/s
2 MB/s
5 tx/s
3.6 GB/h
3.6 M put/h
86.4 GB/d
86.4 M put/d
7.2 GB/h
18k tx/h
172.8 GB/d
432k tx/d
Shard Pricing
24h Retention
$0.015/hr
$11/month
Up to 168h Retention
$0.035/hr
$25.6/month
Extended Retention
$0.020/hr
$14.6/month
* Prices for us-east
+ $0.014 per 1,000,000 PUT Payload Units (1 unit = 25KB)
Max Record Size = 1MB
Kinesis from AWS CLI
aws kinesis create-stream --stream-name myStream
--shard-count 1
aws kinesis list-streams
{
"StreamNames": [
"myStream"
]
}
aws kinesis put-record --stream-name myStream
--partition-key 123
--data “my data”
Collecting Records from SDK
kinesis = new AmazonKinesisClient(…)



result = kinesis.putRecord(new PutRecordRequest()

.withStreamName("myStream")

.withPartitionKey("partitionKey")

.withData(bytes))
kinesis = new AmazonKinesisAsyncClient(…)



future = kinesis.putRecordAsync(new PutRecordRequest()

.withStreamName("myStream")

.withPartitionKey("partitionKey")

.withData(bytes))
or
Collecting Records (Batch)
kinesis = new AmazonKinesisClient(…)

...
records.add(new PutRecordsRequestEntry()

.withPartitionKey("partitionKey")

.withData(bytes))
records.add(…)


results = kinesis.putRecords(new PutRecordsRequest()

.withStreamName("myStream")

.withRecords(records))

KPL (Kinesis Producer Library)
aggregationbuffering collection
w/PutRequests
records
Collecting with KPL
config = new KinesisProducerConfiguration()
.setRecordMaxBufferedTime(200) // millis
.setMaxConnections(4)
.setRequestTimeout(60000)
.setRegion(“us-east-1”)
producer = new KinesisProducer(config);
producer.addUserRecord(“myStream”, “partitionKey1”, bytes1);
producer.addUserRecord(“myStream”, “partitionKey2”, bytes2);
Consumer APIs
High-level API (KCL = Kinesis Client Library)
Low-level API (with shard iterators)
Low-Level API with Shard Iterators
AT_SEQUENCE_NUMBER
LATEST
TRIM_HORIZON
AFTER_SEQUENCE_NUMBER
New
Records
All Records
in Last 24hs
New Records
Get
Records
Max 5 read transactions per second per shard
Shard
Kinesis from AWS CLI
aws kinesis describe-stream --stream-name myStream
{
"StreamDescription": {
"StreamStatus": "ACTIVE",
"StreamName": "myStream",
"StreamARN": "arn:aws:kinesis:…:stream/myStream",
"Shards": [
{
"ShardId": "shardId-000000000000",
"HashKeyRange": {
"EndingHashKey": "…",
"StartingHashKey": "…"
},
"SequenceNumberRange": {
"StartingSequenceNumber": "…"
}
}
]
}
}
Kinesis from AWS CLI
aws kinesis get-shard-iterator --stream-name myStream
--shard-id shardId-000000000000
--shard-iterator-type TRIM_HORIZON
{
"ShardIterator": "… iterator id …"
}
aws kinesis get-records --shard-iterator "… iterator id .."
{
"Records":[ {
"Data": "...",
"PartitionKey": "...",
"SequenceNumber": "..."
} ],
"MillisBehindLatest": 1000,
"NextShardIterator": "… new iterator id …"
}
Splitting/Merging Shards
Shard (CLOSED)
Shard (OPEN)
old records remains
at parent
children
Shard (OPEN)
after 24hs states
changes from CLOSED
to EXPIRED
new
events
added to
children
GetRecords consumes from parent by using
1 shard iterator until split is detected.
Then 2 iterators are required to consume from children
Consuming Records with KCL
App w/2 consumersStream with 3 shards
Record
Processor
KCLKCL
Record
Processor
Record
Processor
KCL (Kinesis Client Library)
Shard processing balanced across nodes
If node fails, shards are re-assigned to remaining nodes
machine01machine02
KCL Coordination w/DynamoDB
App w/2 consumer nodes
Record
Processor
KCL
KCL
Record
Processor
Record
Processor
lease key checkpoint lease counter lease owner
shard01 … 123 machine01
shard02 … 234 machine01
shard03 … 345 machine02
machine01 machine02
lease counter continuously incremented (as a heart-beat)
App Id used a table name. DynamoDB with conditional updates
DynamoDB
TableName=AppID
Consuming Records (KCL)
class MyProcessor implements IRecordProcessor {
void processRecords(
List<Record> records,
IRecordProcessorCheckpointer checkpointer)
{
for (Record record: records) {
// Process record …
}


checkpointer.checkpoint()

}
}
* KCL available for: Java, Node.js, .NET, Python, Ruby
Thanks,
Fernando Rodriguez Olivera
@frodriguez
frodriguez <at> gmail.com

Contenu connexe

Tendances

The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 

Tendances (20)

Sqs and loose coupling
Sqs and loose couplingSqs and loose coupling
Sqs and loose coupling
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Building Chatbots with Amazon Lex
Building Chatbots with Amazon LexBuilding Chatbots with Amazon Lex
Building Chatbots with Amazon Lex
 
[pgday.Seoul 2022] PostgreSQL with Google Cloud
[pgday.Seoul 2022] PostgreSQL with Google Cloud[pgday.Seoul 2022] PostgreSQL with Google Cloud
[pgday.Seoul 2022] PostgreSQL with Google Cloud
 
SRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon AuroraSRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon Aurora
 
Amazon RDS Deep Dive
Amazon RDS Deep DiveAmazon RDS Deep Dive
Amazon RDS Deep Dive
 
Elastic Load Balancing Deep Dive - AWS Online Tech Talk
Elastic  Load Balancing Deep Dive - AWS Online Tech TalkElastic  Load Balancing Deep Dive - AWS Online Tech Talk
Elastic Load Balancing Deep Dive - AWS Online Tech Talk
 
Amazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration Service
 
Amazon API Gateway
Amazon API GatewayAmazon API Gateway
Amazon API Gateway
 
CloudWatch 성능 모니터링과 신속한 대응을 위한 노하우 - 박선용 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
CloudWatch 성능 모니터링과 신속한 대응을 위한 노하우 - 박선용 솔루션즈 아키텍트:: AWS Cloud Track 3 GamingCloudWatch 성능 모니터링과 신속한 대응을 위한 노하우 - 박선용 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
CloudWatch 성능 모니터링과 신속한 대응을 위한 노하우 - 박선용 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
 
AWS Route53
AWS Route53AWS Route53
AWS Route53
 
Amazon Redshift로 데이터웨어하우스(DW) 구축하기
Amazon Redshift로 데이터웨어하우스(DW) 구축하기Amazon Redshift로 데이터웨어하우스(DW) 구축하기
Amazon Redshift로 데이터웨어하우스(DW) 구축하기
 
Introducing DynamoDB
Introducing DynamoDBIntroducing DynamoDB
Introducing DynamoDB
 

En vedette

AWS + Windows(C#)で構築する.NET最先端技術によるハイパフォーマンスウェブアプリケーション開発実践
AWS + Windows(C#)で構築する.NET最先端技術によるハイパフォーマンスウェブアプリケーション開発実践AWS + Windows(C#)で構築する.NET最先端技術によるハイパフォーマンスウェブアプリケーション開発実践
AWS + Windows(C#)で構築する.NET最先端技術によるハイパフォーマンスウェブアプリケーション開発実践
Yoshifumi Kawai
 
Life of an Fluentd event
Life of an Fluentd eventLife of an Fluentd event
Life of an Fluentd event
Kiyoto Tamura
 

En vedette (20)

A Serverless Data Pipeline
A Serverless Data PipelineA Serverless Data Pipeline
A Serverless Data Pipeline
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
NoSQL Essentials: Cassandra
NoSQL Essentials: CassandraNoSQL Essentials: Cassandra
NoSQL Essentials: Cassandra
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
 
Introduction to Amazon Kinesis Firehose - AWS August Webinar Series
Introduction to Amazon Kinesis Firehose - AWS August Webinar SeriesIntroduction to Amazon Kinesis Firehose - AWS August Webinar Series
Introduction to Amazon Kinesis Firehose - AWS August Webinar Series
 
Interfacing C/C++ and Python with SWIG
Interfacing C/C++ and Python with SWIGInterfacing C/C++ and Python with SWIG
Interfacing C/C++ and Python with SWIG
 
How to Make Own Framework built on OWIN
How to Make Own Framework built on OWINHow to Make Own Framework built on OWIN
How to Make Own Framework built on OWIN
 
Apache Spark & Streaming
Apache Spark & StreamingApache Spark & Streaming
Apache Spark & Streaming
 
LINQ in Unity
LINQ in UnityLINQ in Unity
LINQ in Unity
 
The History of Reactive Extensions
The History of Reactive ExtensionsThe History of Reactive Extensions
The History of Reactive Extensions
 
UniRx - Reactive Extensions for Unity
UniRx - Reactive Extensions for UnityUniRx - Reactive Extensions for Unity
UniRx - Reactive Extensions for Unity
 
AWS + Windows(C#)で構築する.NET最先端技術によるハイパフォーマンスウェブアプリケーション開発実践
AWS + Windows(C#)で構築する.NET最先端技術によるハイパフォーマンスウェブアプリケーション開発実践AWS + Windows(C#)で構築する.NET最先端技術によるハイパフォーマンスウェブアプリケーション開発実践
AWS + Windows(C#)で構築する.NET最先端技術によるハイパフォーマンスウェブアプリケーション開発実践
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 
Reactive Programming by UniRx for Asynchronous & Event Processing
Reactive Programming by UniRx for Asynchronous & Event ProcessingReactive Programming by UniRx for Asynchronous & Event Processing
Reactive Programming by UniRx for Asynchronous & Event Processing
 
Life of an Fluentd event
Life of an Fluentd eventLife of an Fluentd event
Life of an Fluentd event
 
AWS CloudFormation Masterclass
AWS CloudFormation MasterclassAWS CloudFormation Masterclass
AWS CloudFormation Masterclass
 
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
 
(DVO304) AWS CloudFormation Best Practices
(DVO304) AWS CloudFormation Best Practices(DVO304) AWS CloudFormation Best Practices
(DVO304) AWS CloudFormation Best Practices
 
AWS Lambda
AWS LambdaAWS Lambda
AWS Lambda
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
 

Similaire à AWS Kinesis Streams

Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
Roger Barga
 
1.0 - AWS-DAS-Collection-Kinesis.pdf
1.0 - AWS-DAS-Collection-Kinesis.pdf1.0 - AWS-DAS-Collection-Kinesis.pdf
1.0 - AWS-DAS-Collection-Kinesis.pdf
SreeGe1
 

Similaire à AWS Kinesis Streams (20)

Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
1.0 - AWS-DAS-Collection-Kinesis.pdf
1.0 - AWS-DAS-Collection-Kinesis.pdf1.0 - AWS-DAS-Collection-Kinesis.pdf
1.0 - AWS-DAS-Collection-Kinesis.pdf
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache Storm
 
AWS Kinesis
AWS KinesisAWS Kinesis
AWS Kinesis
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 
Modern data architectures for real time analytics and engagement
Modern data architectures for real time analytics and engagementModern data architectures for real time analytics and engagement
Modern data architectures for real time analytics and engagement
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS Lambda
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

AWS Kinesis Streams

  • 1. Fernando Rodriguez Olivera @frodriguez Buenos Aires, Argentina, Dec 2015 Amazon Kinesis AWS User Group Argentina
  • 2. Twitter: @frodriguez Professor at Universidad Austral (Distributed Systems, Compiler Design, Operating Systems, …) Creator of mvnrepository.com Organizer at Buenos Aires High Scalability Group Fernando Rodriguez Olivera
  • 3. Amazon Kinesis Streams High-throughput, low-latency service for real-time data processing over large, distributed data streams
  • 4. Kinesis Streams ... ... Producers Kinesis Stream data retention between 24 to 168 hrs App #1 App #2 designed for < 1 sec latency
  • 5. Shards ... ... Producers Kinesis Stream App #1 App #2 Shard 1 Shard 2 PK9PK9 PK7PK1 PK1 KinesisEndpoints Shard 3 PK3PK6 Records annotated with same Partition Key (PK) are stored in the same shard
  • 6. Shard Capacity New Records Get Records 24h Retention Max 86.4GB 168h Retention Max 604.8GB 1 MB/s 1K put/s 2 MB/s 5 tx/s 3.6 GB/h 3.6 M put/h 86.4 GB/d 86.4 M put/d 7.2 GB/h 18k tx/h 172.8 GB/d 432k tx/d
  • 7. Shard Pricing 24h Retention $0.015/hr $11/month Up to 168h Retention $0.035/hr $25.6/month Extended Retention $0.020/hr $14.6/month * Prices for us-east + $0.014 per 1,000,000 PUT Payload Units (1 unit = 25KB) Max Record Size = 1MB
  • 8. Kinesis from AWS CLI aws kinesis create-stream --stream-name myStream --shard-count 1 aws kinesis list-streams { "StreamNames": [ "myStream" ] } aws kinesis put-record --stream-name myStream --partition-key 123 --data “my data”
  • 9. Collecting Records from SDK kinesis = new AmazonKinesisClient(…)
 
 result = kinesis.putRecord(new PutRecordRequest()
 .withStreamName("myStream")
 .withPartitionKey("partitionKey")
 .withData(bytes)) kinesis = new AmazonKinesisAsyncClient(…)
 
 future = kinesis.putRecordAsync(new PutRecordRequest()
 .withStreamName("myStream")
 .withPartitionKey("partitionKey")
 .withData(bytes)) or
  • 10. Collecting Records (Batch) kinesis = new AmazonKinesisClient(…)
 ... records.add(new PutRecordsRequestEntry()
 .withPartitionKey("partitionKey")
 .withData(bytes)) records.add(…) 
 results = kinesis.putRecords(new PutRecordsRequest()
 .withStreamName("myStream")
 .withRecords(records))

  • 11. KPL (Kinesis Producer Library) aggregationbuffering collection w/PutRequests records
  • 12. Collecting with KPL config = new KinesisProducerConfiguration() .setRecordMaxBufferedTime(200) // millis .setMaxConnections(4) .setRequestTimeout(60000) .setRegion(“us-east-1”) producer = new KinesisProducer(config); producer.addUserRecord(“myStream”, “partitionKey1”, bytes1); producer.addUserRecord(“myStream”, “partitionKey2”, bytes2);
  • 13. Consumer APIs High-level API (KCL = Kinesis Client Library) Low-level API (with shard iterators)
  • 14. Low-Level API with Shard Iterators AT_SEQUENCE_NUMBER LATEST TRIM_HORIZON AFTER_SEQUENCE_NUMBER New Records All Records in Last 24hs New Records Get Records Max 5 read transactions per second per shard Shard
  • 15. Kinesis from AWS CLI aws kinesis describe-stream --stream-name myStream { "StreamDescription": { "StreamStatus": "ACTIVE", "StreamName": "myStream", "StreamARN": "arn:aws:kinesis:…:stream/myStream", "Shards": [ { "ShardId": "shardId-000000000000", "HashKeyRange": { "EndingHashKey": "…", "StartingHashKey": "…" }, "SequenceNumberRange": { "StartingSequenceNumber": "…" } } ] } }
  • 16. Kinesis from AWS CLI aws kinesis get-shard-iterator --stream-name myStream --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON { "ShardIterator": "… iterator id …" } aws kinesis get-records --shard-iterator "… iterator id .." { "Records":[ { "Data": "...", "PartitionKey": "...", "SequenceNumber": "..." } ], "MillisBehindLatest": 1000, "NextShardIterator": "… new iterator id …" }
  • 17. Splitting/Merging Shards Shard (CLOSED) Shard (OPEN) old records remains at parent children Shard (OPEN) after 24hs states changes from CLOSED to EXPIRED new events added to children GetRecords consumes from parent by using 1 shard iterator until split is detected. Then 2 iterators are required to consume from children
  • 18. Consuming Records with KCL App w/2 consumersStream with 3 shards Record Processor KCLKCL Record Processor Record Processor KCL (Kinesis Client Library) Shard processing balanced across nodes If node fails, shards are re-assigned to remaining nodes machine01machine02
  • 19. KCL Coordination w/DynamoDB App w/2 consumer nodes Record Processor KCL KCL Record Processor Record Processor lease key checkpoint lease counter lease owner shard01 … 123 machine01 shard02 … 234 machine01 shard03 … 345 machine02 machine01 machine02 lease counter continuously incremented (as a heart-beat) App Id used a table name. DynamoDB with conditional updates DynamoDB TableName=AppID
  • 20. Consuming Records (KCL) class MyProcessor implements IRecordProcessor { void processRecords( List<Record> records, IRecordProcessorCheckpointer checkpointer) { for (Record record: records) { // Process record … } 
 checkpointer.checkpoint()
 } } * KCL available for: Java, Node.js, .NET, Python, Ruby