2. Twitter: @frodriguez
Professor at Universidad Austral (Distributed Systems, Compiler
Design, Operating Systems, …)
Creator of mvnrepository.com
Organizer at Buenos Aires High Scalability Group
Fernando Rodriguez Olivera
7. Shard Pricing
24h Retention
$0.015/hr
$11/month
Up to 168h Retention
$0.035/hr
$25.6/month
Extended Retention
$0.020/hr
$14.6/month
* Prices for us-east
+ $0.014 per 1,000,000 PUT Payload Units (1 unit = 25KB)
Max Record Size = 1MB
9. Collecting Records from SDK
kinesis = new AmazonKinesisClient(…)
result = kinesis.putRecord(new PutRecordRequest()
.withStreamName("myStream")
.withPartitionKey("partitionKey")
.withData(bytes))
kinesis = new AmazonKinesisAsyncClient(…)
future = kinesis.putRecordAsync(new PutRecordRequest()
.withStreamName("myStream")
.withPartitionKey("partitionKey")
.withData(bytes))
or
10. Collecting Records (Batch)
kinesis = new AmazonKinesisClient(…)
...
records.add(new PutRecordsRequestEntry()
.withPartitionKey("partitionKey")
.withData(bytes))
records.add(…)
results = kinesis.putRecords(new PutRecordsRequest()
.withStreamName("myStream")
.withRecords(records))
11. KPL (Kinesis Producer Library)
aggregationbuffering collection
w/PutRequests
records
12. Collecting with KPL
config = new KinesisProducerConfiguration()
.setRecordMaxBufferedTime(200) // millis
.setMaxConnections(4)
.setRequestTimeout(60000)
.setRegion(“us-east-1”)
producer = new KinesisProducer(config);
producer.addUserRecord(“myStream”, “partitionKey1”, bytes1);
producer.addUserRecord(“myStream”, “partitionKey2”, bytes2);
14. Low-Level API with Shard Iterators
AT_SEQUENCE_NUMBER
LATEST
TRIM_HORIZON
AFTER_SEQUENCE_NUMBER
New
Records
All Records
in Last 24hs
New Records
Get
Records
Max 5 read transactions per second per shard
Shard
16. Kinesis from AWS CLI
aws kinesis get-shard-iterator --stream-name myStream
--shard-id shardId-000000000000
--shard-iterator-type TRIM_HORIZON
{
"ShardIterator": "… iterator id …"
}
aws kinesis get-records --shard-iterator "… iterator id .."
{
"Records":[ {
"Data": "...",
"PartitionKey": "...",
"SequenceNumber": "..."
} ],
"MillisBehindLatest": 1000,
"NextShardIterator": "… new iterator id …"
}
17. Splitting/Merging Shards
Shard (CLOSED)
Shard (OPEN)
old records remains
at parent
children
Shard (OPEN)
after 24hs states
changes from CLOSED
to EXPIRED
new
events
added to
children
GetRecords consumes from parent by using
1 shard iterator until split is detected.
Then 2 iterators are required to consume from children
18. Consuming Records with KCL
App w/2 consumersStream with 3 shards
Record
Processor
KCLKCL
Record
Processor
Record
Processor
KCL (Kinesis Client Library)
Shard processing balanced across nodes
If node fails, shards are re-assigned to remaining nodes
machine01machine02
19. KCL Coordination w/DynamoDB
App w/2 consumer nodes
Record
Processor
KCL
KCL
Record
Processor
Record
Processor
lease key checkpoint lease counter lease owner
shard01 … 123 machine01
shard02 … 234 machine01
shard03 … 345 machine02
machine01 machine02
lease counter continuously incremented (as a heart-beat)
App Id used a table name. DynamoDB with conditional updates
DynamoDB
TableName=AppID
20. Consuming Records (KCL)
class MyProcessor implements IRecordProcessor {
void processRecords(
List<Record> records,
IRecordProcessorCheckpointer checkpointer)
{
for (Record record: records) {
// Process record …
}
checkpointer.checkpoint()
}
}
* KCL available for: Java, Node.js, .NET, Python, Ruby