This document discusses Amazon Kinesis, a fully managed service for real-time processing of streaming data. It provides an overview of Kinesis and how it can be used to ingest, store, and process streaming data. Examples are given of how companies are using Kinesis for applications like game analytics, digital advertising metrics, and IoT data processing. The key benefits of Kinesis are also summarized such as its ease of use, real-time performance, elastic scalability, integration with other AWS services, and low cost.
2. Amazon Kinesis
Managed Service for Streaming Data Ingestion & Processing
v
o Origins of Kinesis
The motivation for continuous, real-time processing
Developing the ‘Right tool for the right job’
o What can you do with streaming data today?
Customer Scenarios
Current approaches
o What is Amazon Kinesis?
Kinesis is a building block
Putting data into Kinesis
Getting data from Kinesis Streams: Building applications with KCL
o Connecting Amazon Kinesis to other systems
Moving data into S3, DynamoDB, Redshift
Leveraging existing Lambda, EMR, Storm infrastructure
Click Stream
Demo
4. Some statistics about what AWS Data Services
v
• Metering service
• 10s of millions records per second
• Terabytes per hour
• Hundreds of thousands of sources
• Auditors guarantee 100% accuracy at month end
• Data Warehouse
• 100s extract-transform-load (ETL) jobs every day
• Hundreds of thousands of files per load cycle
• Hundreds of daily users
• Hundreds of queries per hour
6. Internal AWS Metering Service
v
Workload
• 10s of millions records/sec
• Multiple TB per hour
• 100,000s of sources
Pain points
• Doesn’t scale elastically
• Customers want real-time
alerts
• Expensive to operate
• Relies on eventually consistent
storage
7. v
Our Big Data Transition
Old requirements
• Capture huge amounts of data and process it in hourly or daily batches
New requirements
• Make decisions faster, sometimes in real-time
• Scale entire system elastically
• Make it easy to “keep everything”
• Multiple applications can process data in parallel
8. A General Purpose Data Flow
Many different technologies, at different stages of evolution
Client/Sensor Aggregator Continuous
Processing
Storage Analytics +
Reporting
?
9. Big data comes from the small
v
{
"payerId": "Joe",
"productCode": "AmazonS3",
"clientProductCode": "AmazonS3",
"usageType": "Bandwidth",
"operation": "PUT",
"value": "22490",
"timestamp": "1216674828"
}
Metering Record
127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -
0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Common Log Entry
<165>1 2003-10-11T22:14:15.003Z
mymachine.example.com evntslog - ID47
[exampleSDID@32473 iut="3"
eventSource="Application"
eventID="1011"][examplePriority@32473
class="high"]
Syslog Entry
“SeattlePublicWater/Kinesis/123/Realtime”
– 412309129140
MQTT Record <R,AMZN ,T,G,R1>
NASDAQ OMX Record
10. v
Kinesis
Movement or activity in response to a stimulus.
A fully managed service for real-time processing of high-volume,
streaming data. Kinesis can store and process
terabytes of data an hour from hundreds of thousands of
sources. Data is replicated across multiple Availability
Zones to ensure high durability and availability.
12. Customer Scenarios across Industry Segments
Scenarios Accelerated Ingest-Transform-Load Continual Metrics/ KPI Extraction Responsive Data Analysis
Data Types
IT infrastructure, Applications logs, Social media, Fin. Market data, Web Clickstreams, Sensors, Geo/Location data
Software/
Technology
IT server , App logs ingestion IT operational metrics dashboards Devices / Sensor Operational
Intelligence
Digital Ad Tech./
Marketing
Advertising Data aggregation Advertising metrics like coverage,
yield, conversion
Analytics on User engagement with
Ads, Optimized bid/ buy engines
Financial Services Market/ Financial Transaction order data
collection
Financial market data metrics Fraud monitoring, and Value-at-Risk
assessment, Auditing of market order
data
Consumer Online/
E-Commerce
Online customer engagement data
aggregation
Consumer engagement metrics like
page views, CTR
Customer clickstream analytics,
Recommendation engines
1 2 3
13. What Biz. Problem needs to be solved?
Mobile/ Social Gaming Digital Advertising Tech.
Deliver continuous/ real-time delivery of game insight
data by 100’s of game servers
Generate real-time metrics, KPIs for online ad performance
for advertisers/ publishers
Custom-built solutions operationally complex to
manage, & not scalable
Store + Forward fleet of log servers, and Hadoop based
processing pipeline
• Delay with critical business data delivery
• Developer burden in building reliable, scalable
platform for real-time data ingestion/ processing
• Slow-down of real-time customer insights
• Lost data with Store/ Forward layer
• Operational burden in managing reliable, scalable platform
for real-time data ingestion/ processing
• Batch-driven real-time customer insights
Accelerate time to market of elastic, real-time
applications – while minimizing operational overhead
Generate freshest analytics on advertiser performance to
optimize marketing spend, and increase responsiveness to
clients
24. ‘Typical’ Technology Solution Set
Solution Architecture Set
o Streaming Data Ingestion
Kafka
Flume
Kestrel / Scribe
RabbitMQ / AMQP
o Streaming Data Processing
Storm
o Do-It-yourself (AWS) based solution
EC2: Logging/ pass through servers
EBS: holds log/ other data snapshots
SQS: Queue data store
S3: Persistence store
EMR: workflow to ingest data from S3 and
process
o Exploring Continual data Ingestion &
Processing
Solution Architecture Considerations
Flexibility: Select the most appropriate software, and configure
underlying infrastructure
Control: Software and hardware can be tuned to meet specific
business and scenario needs.
Ongoing Operational Complexity: Deploy, and manage an
end-to-end system
Infrastructure planning and maintenance: Managing a
reliable, scalable infrastructure
Developer/ IT staff expense: Developers, Devops and IT staff
time and energy expended
Software Maintenance : Tech. and professional services support
25. Foundation for Data Streams Ingestion, Continuous Processing
Right Toolset for the Right Job
Real-time Ingest
• Highly Scalable
• Durable
• Elastic
• Replay-able Reads
Continuous Processing FX
• Load-balancing incoming streams
• Fault-tolerance, Checkpoint / Replay
• Elastic
• Enable multiple apps to process in parallel
Continuous, real-time workloads
Managed Service
Low end-to-end latency
Enable data movement into Stores/ Processing Engines
26. Kinesis Architecture
AZ AZ AZ
Durable, highly consistent storage replicates data
across three data centers (availability zones)
Amazon Web Services
Aggregate and
archive to S3
Millions of
sources producing
100s of terabytes
per hour
Front
End
Authentication
Authorization
Ordered stream
of events supports
multiple readers
Real-time
dashboards
and alarms
Machine learning
algorithms or
sliding window
analytics
Aggregate analysis
in Hadoop or a
data warehouse
Inexpensive: $0.028 per million puts
Run code in response to an event and
automatically manage compute.
28. Kinesis Stream:
Managed ability to capture and store data
• Streams are made of Shards
• Each Shard ingests data up to
1MB/sec, and up to 1000 TPS
• Each Shard emits up to 2 MB/sec
• All data is stored for 24 hours
• Scale Kinesis streams by adding or
removing Shards
• Replay data inside of 24Hr. Window
29. Putting Data into Kinesis
Simple Put interface to store data in Kinesis
• Producers use a PUT call to store data in a Stream
• PutRecord {Data, PartitionKey, StreamName}
• A Partition Key is supplied by producer and used to
distribute the PUTs across Shards
• Kinesis MD5 hashes supplied partition key over the hash
key range of a Shard
• A unique Sequence # is returned to the Producer upon a
successful PUT call
33. Building Kinesis Processing Apps: Kinesis Client Library
Client library for fault-tolerant, at least-once, Continuous Processing
o Java client library, source available on Github
o Build & Deploy app with KCL on your EC2 instance(s)
o KCL is intermediary b/w your application & stream
Automatically starts a Kinesis Worker for each shard
Simplifies reading by abstracting individual shards
Increase / Decrease Workers as # of shards changes
Checkpoints to keep track of a Worker’s location in the
stream, Restarts Workers if they fail
o Integrates with AutoScaling groups to redistribute workers to
new instances
34. Processing Data with Kinesis : Sample RecordProcessor
public class SampleRecordProcessor implements IRecordProcessor {
@Override
public void initialize(String shardId) {
LOG.info("Initializing record processor for shard: " + shardId);
this.kinesisShardId = shardId;
}
@Override
public void processRecords(List<Record> records, IRecordProcessorCheckpointer checkpointer) {
LOG.info("Processing " + records.size() + " records for kinesisShardId " + kinesisShardId);
// Process records and perform all exception handling.
processRecordsWithRetries(records);
// Checkpoint once every checkpoint interval.
if (System.currentTimeMillis() > nextCheckpointTimeInMillis) {
checkpoint(checkpointer);
nextCheckpointTimeInMillis = System.currentTimeMillis() + CHECKPOINT_INTERVAL_MILLIS;
}
}
}
35. Processing Data with Kinesis : Sample Worker
IRecordProcessorFactory recordProcessorFactory = new
SampleRecordProcessorFactory();
Worker worker = new Worker(recordProcessorFactory,
kinesisClientLibConfiguration);
int exitCode = 0;
try {
worker.run();
} catch (Throwable t) {
LOG.error("Caught throwable while processing data.", t);
exitCode = 1;
}
36. Amazon Kinesis Connector Library
Customizable, Open Source code to Connect Kinesis with S3, Redshift, DynamoDB
S3
DynamoDB
Redshift
Kinesis
ITransformer
• Defines the
transformation
of records
from the
Amazon
Kinesis
stream in
order to suit
the user-defined
data
model
IFilter
• Excludes
irrelevant
records from
the
processing.
IBuffer
• Buffers the set
of records to
be processed
by specifying
size limit (# of
records)&
total byte
count
IEmitter
• Makes client
calls to other
AWS services
and persists
the records
stored in the
buffer.
37. More Options to read from Kinesis Streams
Leveraging Get APIs, existing Storm topologies
o Use the Get APIs for raw reads of Kinesis data streams
• GetRecords {Limit, ShardIterator}
• GetShardIterator {ShardId, ShardIteratorType, StartingSequenceNumber,
StreamName}
o Integrate Kinesis Streams with Storm Topologies
• Bootstraps, via Zookeeper to map Shards to Spout tasks
• Fetches data from Kinesis stream
• Emits tuples and Checkpoints (in Zookeeper)
38. Using EMR to read, and process data from Kinesis Streams
v
Input
• Dev
Processing
• User
My Website
Kinesis
Log4J
Appender
push to
Kinesis
EMR – AMI 3.0.5
Hive
Pig
Cascading
MapReduce
pull from
39. Hadoop ecosystem Implementation & Features
v
• Logical names
• Labels that define units of work
(Job A vs Job B)
• Checkpoints
• Creating an input start and end
points to allow batch processing
• Error Handling
• Service errors
• Retries
• Iterations
• Provide idempotency
(pessimistic locking of the Logical
name)
Hadoop Input format
Hive Storage Handler
Pig Load Function
Cascading Scheme and Tap
40. v
Intended use
• Unlock the power of Hadoop on fresh
data
• Join multiple data sources for analysis
• Filter and preprocess streams
• Export and archive streaming data
41. Customers using Amazon Kinesis
Mobile/ Social Gaming Digital Advertising Tech.
Deliver continuous/ real-time delivery of game insight
data by 100’s of game servers
Generate real-time metrics, KPIs for online ad performance
for advertisers/ publishers
Custom-built solutions operationally complex to
manage, & not scalable
Store + Forward fleet of log servers, and Hadoop based
processing pipeline
• Delay with critical business data delivery
• Developer burden in building reliable, scalable
platform for real-time data ingestion/ processing
• Slow-down of real-time customer insights
• Lost data with Store/ Forward layer
• Operational burden in managing reliable, scalable platform
for real-time data ingestion/ processing
• Batch-driven real-time customer insights
Accelerate time to market of elastic, real-time
applications – while minimizing operational overhead
Generate freshest analytics on advertiser performance to
optimize marketing spend, and increase responsiveness to
clients
43. Digital Ad. Tech Metering with Kinesis
Incremental Ad.
Statistics
Computation
Continuous Ad
Metrics Extraction
Metering Record Archive
Ad Analytics Dashboard
44. Kinesis Pricing
Simple, Pay-as-you-go, & no up-front costs
Pricing Dimension Value
Hourly Shard Rate $0.015
Per 1,000,000 PUT
$0.028
transactions:
• Customers specify throughput requirements in shards, that they control
• Each Shard delivers 1 MB/s on ingest, and 2MB/s on egress
• Inbound data transfer is free
• EC2 instance charges apply for Kinesis processing applications
46. 46
Amazon Kinesis: Key Developer Benefits
Easy Administration
Managed service for real-time streaming
data collection, processing and analysis.
Simply create a new stream, set the
desired level of capacity, and let the
service handle the rest.
Real-time Performance
Perform continual processing on streaming
big data. Processing latencies fall to a few
seconds, compared with the minutes or
hours associated with batch processing.
High Throughput. Elastic
Seamlessly scale to match your data
throughput rate and volume. You can
easily scale up to gigabytes per second.
The service will scale up or down based on
your operational or business needs.
S3, Redshift, & DynamoDB Integration
Reliably collect, process, and transform all
of your data in real-time & deliver to AWS
data stores of choice, with Connectors for
S3, Redshift, and DynamoDB.
Build Real-time Applications
Client libraries that enable developers to
design and operate real-time streaming
data processing applications.
Low Cost
Cost-efficient for workloads of any scale.
You can get started by provisioning a small
stream, and pay low hourly rates only for
what you use.
47. v
Try out Amazon Kinesis
• Try out Amazon Kinesis
• http://aws.amazon.com/kinesis/
• Thumb through the Developer Guide
• http://aws.amazon.com/documentation/kinesis/
• Visit, and Post on Kinesis Forum
• https://forums.aws.amazon.com/forum.jspa?forumID=169#
48. Online Labs | Training
Gain confidence and hands-on
experience with AWS. Watch free
Instructional Videos and explore Self-
Paced Labs
Instructor Led Classes
Learn how to design, deploy and operate
highly available, cost-effective and secure
applications on AWS in courses led by
qualified AWS instructors
AWS Certification
Validate your technical expertise
with AWS and use practice exams
to help you prepare for AWS
Certification
http://aws.amazon.com/training
Notes de l'éditeur
Good morning……/ Good Afternoon……..
My name is……….
Today’s session is about ……
Fire your Questions
This is the agenda … over next 45 – 50mins or so
We are going to talk about –
customers moving from batch to continuous – how kinesis help – how it integrates with other aws services
Like many other Amazon Web Services –
Lets talk about Genesis –
Hourly server logs for alarms – billing alerts – AWS piece of business
Other applications are from amazon.com business – what is popular item – fraud detection – right away instead of hours back
Metering services is related to any action that you take on the AWS platform for ex. Reading from dynamodb deliver content from CF and so on……
Like many of you, we’re dealing with rapid increases in data
In our endeavor to increase the operational efficiency and reduce costs
Our problem was not just how to scale to handle the data but how to increase the speed at which we analyze and take action on that data.
Just to give some numbers …. Read the slide number
AS it will be obvious to you that metering services are the most critical function of any business because loss of data could mean loss of revenue and in certain case delayed analysis could also mean the loss
They no longer remained SIMPLE – billing alerts are required near real time – fraud detection
----- Meeting Notes (12/4/14 21:32) -----
Lets visualize this ... massive volume
We realized that there was this lifecycle that evolved – INGEST (many sources) – PROCESS (Simple queries like aggregates) – STORE – ANALYZE (Complex queries such as trends)
Pain points cropped up
Apart from S3.....
----- Meeting Notes (12/4/14 21:32) -----
That problem area and raised customer expectation formed a new requirement set ....
----- Meeting Notes (12/4/14 22:15) -----
Formed ----> FRAMED new requirements
----- Meeting Notes (12/4/14 21:32) -----
Many different technologies are and have been available for a long time now, however, managing those on own is a challenge and not many of these pieces have been tested or easy to manage at that scale.
However not all of those have necesaily been used at that scale and in some cases the use case itself might not be APT/ FIT
We also realized that there were lots of differnet formats but almost all were small in size say < 1KB
MQTT records are used by IOT devices
Datasets are growing
Need to make fast decisions
Potentially millions of sources
Data records are really small
Needs to be auditable
Need to be durable
Needs to be less expensive
The benefits of real-time processing apply to pretty much every scenario where new data is being generated on a continual basis.
We see it pervasively across all the big data areas we’ve looked at, and in pretty much every industry segment whose customers we’ve spoken to.
What do customers tell us –
Log processing – grab – ingest and do analysis – classic example –
Continual extraction model – start getting metrics on a sliding window basis – say 5mins – stock variations – feeds from NASDAQ or other stock providers to avoid exposure
Real time analytics – take some type of action based on data coming in say last few mins logical extension to the last two options
One feed into other feed
Customers tend to start with mundane, unglamorous applications, such as system log ingestion and processing.
They then progress over time to ever-more sophisticated forms of real-time processing.
Initially they may simply process their data streams to produce simple metrics and reports, and perform simple actions in response, such as emitting alarms over metrics that have breached their warning thresholds.
Eventually they start to do more sophisticated forms of data analysis in order to obtain deeper understanding of their data, such as applying machine learning algorithms.
In the long run they can be expected to apply a multiplicity of complex stream and event processing algorithms to their data.
We then looked outside the company for Specific use cases
Alert when public sentiment changes
Customers want to respond quickly to social media feeds
Twitter Firehose: ~450 million events per second, or 10.1 MB/sec1
Respond to changes in the stock market
Customers want to compute value-at-risk or rebalance portfolios based on stock prices
NASDAQ Ultra Feed: 230 GB/day, or 8.1 MB/sec1
Aggregate data before loading in external data store
Customers have billions of click stream records arriving continuously from servers
Large, aggregated PUTs to S3 are cost effective
Internet marketing company: 20MB/sec
Recommendation and Personalization engines
Nasdaq sending in all market order data across exchanges in real-time into Kinesis , so that their real-time processing workers (built with Kinesis libraries) would maintain a consolidated audit trail that collects and accurately identifies every order, cancellation, modification and trade execution for all exchange-listed equities and options across all U.S. markets.
Finland-based Supercell is one of the fastest-growing social game developers and their two games Hay Day and Clash of Clans attract more than 8.5 million players on iOS and Android devices every day. “Our player base has scaled at an incredible pace. Using AWS means we can rely on AWS tools in managing our infrastructure to match our growth,” said Sami Yliharju, Services Lead at Supercell. “We are using Amazon Kinesis for real-time delivery of game insight data sent by hundreds of our game engine servers. Amazon Kinesis enables our business-critical analytics and dashboard applications to reliably get the data streams they need, without delays. Amazon Kinesis also offloads a lot of developer burden in building a real-time, streaming data ingestion platform, and enables Supercell to focus on delivering games that delight players worldwide.”
Bizo, an AdTech company – ingesting customers online advertising performance data in real-time to generating real-time metrics, KPIs about Ad performance, conversion, yield, and more. They would previously do this end of day with a Hadoop job.
Pros/ Cons
(+) Flexibility: Customers can evaluate and select the most appropriate software and underlying infrastructure for their use case. Additional flexibility to those using cloud based infrastructure for running streaming data systems.
(+) Control: Software and hardware can be tuned to meet specific business and scenario needs.
(+) No software licensing costs: Mostly open source software so no direct licensing costs or cloud based software.
(-) Operational complexity: Deploy, manage, and end to end systems administration of underlying hardware/ service and all software components involved.
(-) Hardware/ Infrastructure maintenance: Direct and indirect costs associated with hardware/ service infrastructure evaluation, capacity planning, and provisioning for steady state, peak and durability.
(-) Unsupported Software: Some component software are not maintained anymore, e.g. Scribe, in other cases the software is a pre-version 1 release such as Kafka/ Storm without vendor support.
(-) Developer/ IT staff overhead: Developers, Devops and IT staff spend time and energy in managing the streaming data infrastructure and lesser time on creating streaming applications for their business.
[2 minutes]
KINESIS is a new service that scales elastically for near realtime processing of streaming big data. The service will store large streams of data in durable, consistent storage, reliably, for near realtime processing of data by an elastically scalable fleet of data processing servers. Large streams means millions of records per second, GBs of data per second and near real-time means order of a few seconds
Streaming data processing has two layers: a storage layer and a processing layer. The storage layer needs to support specialized ordering and consistency semantics that enable fast, inexpensive, and replayable reads and writes of large streams of data. Kinesis is the storage layer in Kinesis / Kinesis. The processing layer is responsible for reading data from the storage layer, processing that data, and notifying the storage layer to delete data that is no longer needed. Kinesis supports the processing layer. Customers compile the Kinesis library into their data processing application. Kinesis notifies the application (the Kinesis Worker) when there is new data to process. The Kinesis / Kinesis control plane works with Kinesis Workers to solve scalability and fault tolerance problems in the processing layer.
Kinesis provides High Availability, Elasticity and Scale.
The fixed unit of scale in a stream is called a shard
Shards can be spit to scale up or merge to scale down
MQTT – Financial record – Log entry
Partition Key
- Idea is to be able to do a basic aggregation of the Data coming in and identify
- Best practices for a partition key would be – customer id / serverId /
Partition Keys
When records are added to a Kinesis Stream a Partition Key must be provided to Kinesis so that the record can be directed at the appropriate shard within the stream
Kinesis currently will generate an MD5 hash of the provided partition key to determine the proper shard to place the record
It is important to not exceed the per shard data rate of any one unique partition key
Kinesis guarantees that records will be placed in order with in a data stream for each unique partition key
Three clicks and you are done with fully managed service to store data and manage
Assistance to size
Refer to S3
Always route to the same shard
Example if partition keys are – customer id or the server which is sending the logs – sections of the website
Because of shard you can do basic aggregation of the
Lots of way to process streaming data, just like there are lots of ways to process batch data with Hadoop
Open source from AWS
Challeneges of real time application on the consumer side
- Application has to keep up the pace with the stream
- If fall behind you loose value of real time streaming
- So this app has to be – distributed + fault tolerant + scale + handle multiple shards
- Recover from failure and start from the point of failure
- Although Low level interfaces are available but to do all this is complicated
- So KCL simplifies this – it provides – checkpointing + keeps state in dynamodb + launches workers and scales and failovers them + simplifies the reading from shards – just write your business logic
Kinesis Client library simplifies parallel processing of streaming big data by allowing customers to write simple applications for processing records. Kinesis is a Java library that customers compile into their data processing application. The application with the Kinesis library is called a Kinesis Worker. Kinesis’s library notifies the customer’s processing code when there is new data to process. Kinesis Workers will often process data and write output to an external data store such as DynamoDB, EMR, Redshift, or S3. Kinesis guarantees that every record within a Kinesis Data stream is processed at least once
This is an example of how the KCL simplifies record processing.
All you need to do is start the process and shutdown the process AND in between use the RECORDPROCESSOR to write your own business logic
Just one method where you write your business logic and KCL takes care of the heavylifting
Other important part of the KCL is Worker management – this is a sample code for the samepl
K Client library gets you up and running quickly in Addition to that we also have library for connecting to other AWS Services – CALLED Connector library
Kinesis Client library simplifies parallel processing of streaming big data by allowing customers to write simple applications for processing records. Kinesis is a Java library that customers compile into their data processing application. The application with the Kinesis library is called a Kinesis Worker. Kinesis’s library notifies the customer’s processing code when there is new data to process. Kinesis Workers will often process data and write output to an external data store such as DynamoDB, EMR, Redshift, or S3. Kinesis guarantees that every record within a Kinesis Data stream is processed at least once
The grey box on the top right – will be EC2 instances in your space. Within there is the Kinesis worker process in green. That is your specific application and business logic. That logic leverages our libraries that connect back to Kinesis (in pink on the left) . There is also another VIP on the GET side.
EMR connector for Kinesis - convert from streaming model to micro batching on EMR – use existing jobs/ tasks in hive/ pig/ cascasding etc
The EMR-Kinesis connector takes care of the change or adoption from batch to streaming paradigm for ex. Change or re-cast the stream or a shard to logical names
Do checkpointing
Hadoop handles retry of failed map tasks
Iterations allow retrys
Fixed input boundaries on a stream (idempotency for reruns)
Enable multiple queries on the same input boundaries
Fresh data streaming joined with stored data
Filter and preprocess and do complex ETL functions such as route data to S3 and Hbase
Opens whole lot of new use cases
Nasdaq sending in all market order data across exchanges in real-time into Kinesis , so that their real-time processing workers (built with Kinesis libraries) would maintain a consolidated audit trail that collects and accurately identifies every order, cancellation, modification and trade execution for all exchange-listed equities and options across all U.S. markets.
Finland-based Supercell is one of the fastest-growing social game developers and their two games Hay Day and Clash of Clans attract more than 8.5 million players on iOS and Android devices every day. “Our player base has scaled at an incredible pace. Using AWS means we can rely on AWS tools in managing our infrastructure to match our growth,” said Sami Yliharju, Services Lead at Supercell. “We are using Amazon Kinesis for real-time delivery of game insight data sent by hundreds of our game engine servers. Amazon Kinesis enables our business-critical analytics and dashboard applications to reliably get the data streams they need, without delays. Amazon Kinesis also offloads a lot of developer burden in building a real-time, streaming data ingestion platform, and enables Supercell to focus on delivering games that delight players worldwide.”
Bizo, an AdTech company – ingesting customers online advertising performance data in real-time to generating real-time metrics, KPIs about Ad performance, conversion, yield, and more. They would previously do this end of day with a Hadoop job.
Stream of metering records.
Incremental bill computation, uploaded every few minute to Redshift so that customers can query their estimated bill.
Archive metering records, aggregated to 1 hour buckets, in S3
Billing alerts in real-time.
SDKs available across many languages or call REST APIs
We have multiple trainings which can help you to get upto speed with architecting best practices and even deep dive into specific services such big data.
Self-paced labs are available please take advantage of the same.
We also have a number of certifications available at both Associate and Professional level
For more details please check out this URL /training