SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Introducing Amazon Kinesis
Ryan Waite, GM AWS Data Services
Adi Krishnan, Sr. PM, Amazon
March 26, 2014
Amazon Kinesis
Managed Service for Streaming Data Ingestion & Processing
o Origins of Kinesis
 The motivation for continuous, real-time processing
 Developing the ‘Right tool for the right job’
o What can you do with streaming data today?
 Customer Scenarios
 Current approaches
o What is Amazon Kinesis?
 Kinesis is a building block
 Putting data into Kinesis
 Getting data from Kinesis Streams: Building applications with KCL
o Connecting Amazon Kinesis to other systems
 Moving data into S3, DynamoDB, Redshift
 Leveraging existing EMR, Storm infrastructure
The Motivation for Continuous Processing
Some statistics about what AWS Data Services
• Metering service
– 10s of millions records per second
– Terabytes per hour
– Hundreds of thousands of sources
– Auditors guarantee 100% accuracy at month end
• Data Warehouse
– 100s extract-transform-load (ETL) jobs every day
– Hundreds of thousands of files per load cycle
– Hundreds of daily users
– Hundreds of queries per hour
Metering Service
Internal AWS Metering Service
Workload
• 10s of millions records/sec
• Multiple TB per hour
• 100,000s of sources
Pain points
• Doesn’t scale elastically
• Customers want real-time
alerts
• Expensive to operate
• Relies on eventually
consistent storage
Our Big Data Transition
Old requirements
• Capture huge amounts of data and process it in hourly or daily batches
New requirements
• Make decisions faster, sometimes in real-time
• Scale entire system elastically
• Make it easy to “keep everything”
• Multiple applications can process data in parallel
A General Purpose Data Flow
Many different technologies, at different stages of evolution
Client/Sensor Aggregator Continuous
Processing
Storage Analytics +
Reporting
?
Big data comes from the small
{
"payerId": "Joe",
"productCode": "AmazonS3",
"clientProductCode": "AmazonS3",
"usageType": "Bandwidth",
"operation": "PUT",
"value": "22490",
"timestamp": "1216674828"
}
Metering Record
127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700]
"GET /apache_pb.gif HTTP/1.0" 200 2326
Common Log Entry
<165>1 2003-10-11T22:14:15.003Z
mymachine.example.com evntslog - ID47
[exampleSDID@32473 iut="3"
eventSource="Application"
eventID="1011"][examplePriority@32473
class="high"]
Syslog Entry
“SeattlePublicWater/Kinesis/123/Realtime”
– 412309129140
MQTT Record <R,AMZN ,T,G,R1>
NASDAQ OMX Record
Kinesis
Movement or activity in response to a stimulus.
A fully managed service for real-time processing of high-
volume, streaming data. Kinesis can store and process
terabytes of data an hour from hundreds of thousands of
sources. Data is replicated across multiple Availability
Zones to ensure high durability and availability.
Customer View
Scenarios Accelerated Ingest-Transform-Load Continual Metrics/ KPI Extraction Responsive Data Analysis
Data Types IT infrastructure, Applications logs, Social media, Fin. Market data, Web Clickstreams, Sensors, Geo/Location data
Software/
Technology
IT server , App logs ingestion IT operational metrics dashboards Devices / Sensor Operational
Intelligence
Digital Ad Tech./
Marketing
Advertising Data aggregation Advertising metrics like coverage, yield,
conversion
Analytics on User engagement with
Ads, Optimized bid/ buy engines
Financial Services Market/ Financial Transaction order data
collection
Financial market data metrics Fraud monitoring, and Value-at-Risk
assessment, Auditing of market order
data
Consumer Online/
E-Commerce
Online customer engagement data
aggregation
Consumer engagement metrics like
page views, CTR
Customer clickstream analytics,
Recommendation engines
Customer Scenarios across Industry Segments
1 2 3
What Biz. Problem needs to be solved?
Mobile/ Social Gaming Digital Advertising Tech.
Deliver continuous/ real-time delivery of game
insight data by 100’s of game servers
Generate real-time metrics, KPIs for online ad
performance for advertisers/ publishers
Custom-built solutions operationally complex to
manage, & not scalable
Store + Forward fleet of log servers, and Hadoop based
processing pipeline
• Delay with critical business data delivery
• Developer burden in building reliable, scalable
platform for real-time data ingestion/ processing
• Slow-down of real-time customer insights
• Lost data with Store/ Forward layer
• Operational burden in managing reliable, scalable
platform for real-time data ingestion/ processing
• Batch-driven real-time customer insights
Accelerate time to market of elastic, real-time
applications – while minimizing operational
overhead
Generate freshest analytics on advertiser performance
to optimize marketing spend, and increase
responsiveness to clients
Solution Architecture Set
o Streaming Data Ingestion
 Kafka
 Flume
 Kestrel / Scribe
 RabbitMQ / AMQP
o Streaming Data Processing
 Storm
o Do-It-yourself (AWS) based solution
 EC2: Logging/ pass through servers
 EBS: holds log/ other data snapshots
 SQS: Queue data store
 S3: Persistence store
 EMR: workflow to ingest data from S3 and
process
o Exploring Continual data Ingestion &
Processing
‘Typical’ Technology Solution Set
Solution Architecture Considerations
Flexibility: Select the most appropriate software, and
configure underlying infrastructure
Control: Software and hardware can be tuned to meet
specific business and scenario needs.
Ongoing Operational Complexity: Deploy, and manage
an end-to-end system
Infrastructure planning and maintenance: Managing
a reliable, scalable infrastructure
Developer/ IT staff expense: Developers, Devops and IT
staff time and energy expended
Software Maintenance : Tech. and professional services
support
Foundation for Data Streams Ingestion, Continuous Processing
Right Toolset for the Right Job
Real-time Ingest
• Highly Scalable
• Durable
• Elastic
• Replay-able Reads
Continuous Processing FX
• Load-balancing incoming streams
• Fault-tolerance, Checkpoint / Replay
• Elastic
• Enable multiple apps to process in parallel
Enable data movement into Stores/ Processing Engines
Managed Service
Low end-to-end latency
Continuous, real-time workloads
Kinesis Architecture
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates data
across three data centers (availability zones)
Aggregate and
archive to S3
Millions of
sources producing
100s of terabytes
per hour
Front
End
Authentication
Authorization
Ordered stream
of events supports
multiple readers
Real-time
dashboards
and alarms
Machine learning
algorithms or
sliding window
analytics
Aggregate analysis
in Hadoop or a
data warehouse
Inexpensive: $0.028 per million puts
Amazon Kinesis – An Overview
Kinesis Stream:
Managed ability to capture and store data
• Streams are made of Shards
• Each Shard ingests data up to
1MB/sec, and up to 1000 TPS
• Each Shard emits up to 2 MB/sec
• All data is stored for 24 hours
• Scale Kinesis streams by adding
or removing Shards
• Replay data inside of 24Hr.
Window
Putting Data into Kinesis
Simple Put interface to store data in Kinesis
• Producers use a PUT call to store data in a Stream
• PutRecord {Data, PartitionKey, StreamName}
• A Partition Key is supplied by producer and used to
distribute the PUTs across Shards
• Kinesis MD5 hashes supplied partition key over the
hash key range of a Shard
• A unique Sequence # is returned to the Producer
upon a successful PUT call
Creating and Sizing a Kinesis Stream
Getting Started with Kinesis – Writing to a Stream
POST / HTTP/1.1
Host: kinesis.<region>.<domain>
x-amz-Date: <Date>
Authorization: AWS4-HMAC-SHA256 Credential=<Credential>, SignedHeaders=content-
type;date;host;user-agent;x-amz-date;x-amz-target;x-amzn-requestid,
Signature=<Signature>
User-Agent: <UserAgentString>
Content-Type: application/x-amz-json-1.1
Content-Length: <PayloadSizeBytes>
Connection: Keep-Alive
X-Amz-Target: Kinesis_20131202.PutRecord
{
"StreamName": "exampleStreamName",
"Data": "XzxkYXRhPl8x",
"PartitionKey": "partitionKey"
}
Sending & Reading Data from Kinesis Streams
HTTP Post
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Kinesis Client
Library
+
Connector Library
Apache
Storm
Amazon Elastic
MapReduce
Sending Reading
Building Kinesis Processing Apps: Kinesis Client Library
Client library for fault-tolerant, at least-once, Continuous Processing
o Java client library, source available on Github
o Build & Deploy app with KCL on your EC2 instance(s)
o KCL is intermediary b/w your application & stream
 Automatically starts a Kinesis Worker for each shard
 Simplifies reading by abstracting individual shards
 Increase / Decrease Workers as # of shards changes
 Checkpoints to keep track of a Worker’s location in the
stream, Restarts Workers if they fail
o Integrates with AutoScaling groups to redistribute workers
to new instances
Processing Data with Kinesis : Sample RecordProcessor
public class SampleRecordProcessor implements IRecordProcessor {
@Override
public void initialize(String shardId) {
LOG.info("Initializing record processor for shard: " + shardId);
this.kinesisShardId = shardId;
}
@Override
public void processRecords(List<Record> records, IRecordProcessorCheckpointer checkpointer) {
LOG.info("Processing " + records.size() + " records for kinesisShardId " + kinesisShardId);
// Process records and perform all exception handling.
processRecordsWithRetries(records);
// Checkpoint once every checkpoint interval.
if (System.currentTimeMillis() > nextCheckpointTimeInMillis) {
checkpoint(checkpointer);
nextCheckpointTimeInMillis = System.currentTimeMillis() + CHECKPOINT_INTERVAL_MILLIS;
}
}
}
Processing Data with Kinesis : Sample Worker
IRecordProcessorFactory recordProcessorFactory = new
SampleRecordProcessorFactory();
Worker worker = new Worker(recordProcessorFactory,
kinesisClientLibConfiguration);
int exitCode = 0;
try {
worker.run();
} catch (Throwable t) {
LOG.error("Caught throwable while processing data.", t);
exitCode = 1;
}
Amazon Kinesis Connector Library
Customizable, Open Source code to Connect Kinesis with S3, Redshift,
DynamoDB
S3
DynamoDB
Redshift
Kinesis
ITransformer
• Defines the
transformation
of records
from the
Amazon
Kinesis stream
in order to suit
the user-
defined data
model
IFilter
• Excludes
irrelevant
records from
the
processing.
IBuffer
• Buffers the set
of records to
be processed
by specifying
size limit (# of
records)& total
byte count
IEmitter
• Makes client
calls to other
AWS services
and persists
the records
stored in the
buffer.
More Options to read from Kinesis Streams
Leveraging Get APIs, existing Storm topologies
o Use the Get APIs for raw reads of Kinesis data streams
• GetRecords {Limit, ShardIterator}
• GetShardIterator {ShardId, ShardIteratorType, StartingSequenceNumber, StreamName}
o Integrate Kinesis Streams with Storm Topologies
• Bootstraps, via Zookeeper to map Shards to Spout tasks
• Fetches data from Kinesis stream
• Emits tuples and Checkpoints (in Zookeeper)
Using EMR to read, and process data from Kinesis Streams
Processing
Input
• User
• Dev
My Website
Kinesis
Log4J
Appender
push to
Kinesis
EMR – AMI 3.0.5
Hive
Pig
Cascading
MapReduce
pull from
Hadoop ecosystem Implementation & Features
• Logical names
–Labels that define units of work
(Job A vs Job B)
• Checkpoints
– Creating an input start and end
points to allow batch processing
• Error Handling
–Service errors
–Retries
• Iterations
– Provide idempotency
(pessimistic locking of the Logical
name)
Hadoop Input format
Hive Storage Handler
Pig Load Function
Cascading Scheme and Tap
Intended use
• Unlock the power of Hadoop on
fresh data
– Join multiple data sources for analysis
– Filter and preprocess streams
– Export and archive streaming data
Customers using Amazon Kinesis
Mobile/ Social Gaming Digital Advertising Tech.
Deliver continuous/ real-time delivery of game
insight data by 100’s of game servers
Generate real-time metrics, KPIs for online ad
performance for advertisers/ publishers
Custom-built solutions operationally complex to
manage, & not scalable
Store + Forward fleet of log servers, and Hadoop based
processing pipeline
• Delay with critical business data delivery
• Developer burden in building reliable, scalable
platform for real-time data ingestion/ processing
• Slow-down of real-time customer insights
• Lost data with Store/ Forward layer
• Operational burden in managing reliable, scalable
platform for real-time data ingestion/ processing
• Batch-driven real-time customer insights
Accelerate time to market of elastic, real-time
applications – while minimizing operational
overhead
Generate freshest analytics on advertiser performance
to optimize marketing spend, and increase
responsiveness to clients
Under NDA
Gaming Analytics with Amazon Kinesis
Digital Ad. Tech Metering with Kinesis
Continuous Ad
Metrics Extraction
Incremental Ad.
Statistics
Computation
Metering Record Archive
Ad Analytics Dashboard
Kinesis Pricing
Simple, Pay-as-you-go, & no up-front costs
Pricing Dimension Value
Hourly Shard Rate $0.015
Per 1,000,000 PUT
transactions:
$0.028
• Customers specify throughput requirements in shards, that they control
• Each Shard delivers 1 MB/s on ingest, and 2MB/s on egress
• Inbound data transfer is free
• EC2 instance charges apply for Kinesis processing applications
38
Easy Administration
Managed service for real-time streaming
data collection, processing and analysis.
Simply create a new stream, set the desired
level of capacity, and let the service handle
the rest.
Real-time Performance
Perform continual processing on streaming
big data. Processing latencies fall to a few
seconds, compared with the minutes or
hours associated with batch processing.
High Throughput. Elastic
Seamlessly scale to match your data
throughput rate and volume. You can easily
scale up to gigabytes per second. The service
will scale up or down based on your
operational or business needs.
S3, Redshift, & DynamoDB Integration
Reliably collect, process, and transform all of
your data in real-time & deliver to AWS data
stores of choice, with Connectors for S3,
Redshift, and DynamoDB.
Build Real-time Applications
Client libraries that enable developers to
design and operate real-time streaming data
processing applications.
Low Cost
Cost-efficient for workloads of any scale. You
can get started by provisioning a small
stream, and pay low hourly rates only for
what you use.
Amazon Kinesis: Key Developer Benefits
Try out Amazon Kinesis
• Try out Amazon Kinesis
– http://aws.amazon.com/kinesis/
• Thumb through the Developer Guide
– http://aws.amazon.com/documentation/kinesis/
• Visit, and Post on Kinesis Forum
– https://forums.aws.amazon.com/forum.jspa?forumID=169#
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Introducing Amazon Kinesis
Managed Service for Streaming Data
Ingestion, & Processing
Ryan Waite, GM AWS Data Services
Adi Krishnan, Sr. PM, Amazon
March 26, 2014
Thank you!

Contenu connexe

Tendances

Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewAmazon Web Services
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesIsheeta Sanghi
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture PatternsAmazon Web Services
 
Deep Dive on AWS CloudHSM (SEC358-R1) - AWS re:Invent 2018
Deep Dive on AWS CloudHSM (SEC358-R1) - AWS re:Invent 2018Deep Dive on AWS CloudHSM (SEC358-R1) - AWS re:Invent 2018
Deep Dive on AWS CloudHSM (SEC358-R1) - AWS re:Invent 2018Amazon Web Services
 
深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service 深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service Amazon Web Services
 
Data Protection in Transit and at Rest
Data Protection in Transit and at RestData Protection in Transit and at Rest
Data Protection in Transit and at RestAmazon Web Services
 
Filenet test
Filenet testFilenet test
Filenet testkflana26
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317Amazon Web Services
 
Google File System
Google File SystemGoogle File System
Google File Systemnadikari123
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 
만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...
만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...
만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...Amazon Web Services Korea
 
Moving From Actions & Behaviors to Microservices
Moving From Actions & Behaviors to MicroservicesMoving From Actions & Behaviors to Microservices
Moving From Actions & Behaviors to MicroservicesJeff Potts
 
IoT & Azure (EventHub)
IoT & Azure (EventHub)IoT & Azure (EventHub)
IoT & Azure (EventHub)Mirco Vanini
 
Disaster Recovery Options with AWS
Disaster Recovery Options with AWSDisaster Recovery Options with AWS
Disaster Recovery Options with AWSAmazon Web Services
 

Tendances (20)

Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture Patterns
 
Deep Dive on AWS CloudHSM (SEC358-R1) - AWS re:Invent 2018
Deep Dive on AWS CloudHSM (SEC358-R1) - AWS re:Invent 2018Deep Dive on AWS CloudHSM (SEC358-R1) - AWS re:Invent 2018
Deep Dive on AWS CloudHSM (SEC358-R1) - AWS re:Invent 2018
 
深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service 深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service
 
Data Protection in Transit and at Rest
Data Protection in Transit and at RestData Protection in Transit and at Rest
Data Protection in Transit and at Rest
 
Filenet test
Filenet testFilenet test
Filenet test
 
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
 
ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317
 
Google File System
Google File SystemGoogle File System
Google File System
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...
만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...
만들자! 데이터 기반의 스마트 팩토리 - 문태양 AWS 솔루션즈 아키텍트 / 배권 팀장, OCI 정보통신 :: AWS Summit Seou...
 
Moving From Actions & Behaviors to Microservices
Moving From Actions & Behaviors to MicroservicesMoving From Actions & Behaviors to Microservices
Moving From Actions & Behaviors to Microservices
 
AWS Architecture - GDC 2014
AWS Architecture - GDC 2014AWS Architecture - GDC 2014
AWS Architecture - GDC 2014
 
AWS 101
AWS 101AWS 101
AWS 101
 
Auto scaling
Auto scalingAuto scaling
Auto scaling
 
IoT & Azure (EventHub)
IoT & Azure (EventHub)IoT & Azure (EventHub)
IoT & Azure (EventHub)
 
Disaster Recovery Options with AWS
Disaster Recovery Options with AWSDisaster Recovery Options with AWS
Disaster Recovery Options with AWS
 

Similaire à Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapReduce

Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisAmazon Web Services
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAmazon Web Services
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Amazon Web Services
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Web Services
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAmazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteRoger Barga
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteGigaom
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKSungmin Kim
 
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Amazon Web Services
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxRenjithPillai26
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)Amazon Web Services Korea
 

Similaire à Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapReduce (20)

Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
What's new in AWS?
What's new in AWS?What's new in AWS?
What's new in AWS?
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
 
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Dernier

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Dernier (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapReduce

  • 1. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Introducing Amazon Kinesis Ryan Waite, GM AWS Data Services Adi Krishnan, Sr. PM, Amazon March 26, 2014
  • 2. Amazon Kinesis Managed Service for Streaming Data Ingestion & Processing o Origins of Kinesis  The motivation for continuous, real-time processing  Developing the ‘Right tool for the right job’ o What can you do with streaming data today?  Customer Scenarios  Current approaches o What is Amazon Kinesis?  Kinesis is a building block  Putting data into Kinesis  Getting data from Kinesis Streams: Building applications with KCL o Connecting Amazon Kinesis to other systems  Moving data into S3, DynamoDB, Redshift  Leveraging existing EMR, Storm infrastructure
  • 3. The Motivation for Continuous Processing
  • 4. Some statistics about what AWS Data Services • Metering service – 10s of millions records per second – Terabytes per hour – Hundreds of thousands of sources – Auditors guarantee 100% accuracy at month end • Data Warehouse – 100s extract-transform-load (ETL) jobs every day – Hundreds of thousands of files per load cycle – Hundreds of daily users – Hundreds of queries per hour
  • 6. Internal AWS Metering Service Workload • 10s of millions records/sec • Multiple TB per hour • 100,000s of sources Pain points • Doesn’t scale elastically • Customers want real-time alerts • Expensive to operate • Relies on eventually consistent storage
  • 7. Our Big Data Transition Old requirements • Capture huge amounts of data and process it in hourly or daily batches New requirements • Make decisions faster, sometimes in real-time • Scale entire system elastically • Make it easy to “keep everything” • Multiple applications can process data in parallel
  • 8. A General Purpose Data Flow Many different technologies, at different stages of evolution Client/Sensor Aggregator Continuous Processing Storage Analytics + Reporting ?
  • 9. Big data comes from the small { "payerId": "Joe", "productCode": "AmazonS3", "clientProductCode": "AmazonS3", "usageType": "Bandwidth", "operation": "PUT", "value": "22490", "timestamp": "1216674828" } Metering Record 127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 Common Log Entry <165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"][examplePriority@32473 class="high"] Syslog Entry “SeattlePublicWater/Kinesis/123/Realtime” – 412309129140 MQTT Record <R,AMZN ,T,G,R1> NASDAQ OMX Record
  • 10. Kinesis Movement or activity in response to a stimulus. A fully managed service for real-time processing of high- volume, streaming data. Kinesis can store and process terabytes of data an hour from hundreds of thousands of sources. Data is replicated across multiple Availability Zones to ensure high durability and availability.
  • 12. Scenarios Accelerated Ingest-Transform-Load Continual Metrics/ KPI Extraction Responsive Data Analysis Data Types IT infrastructure, Applications logs, Social media, Fin. Market data, Web Clickstreams, Sensors, Geo/Location data Software/ Technology IT server , App logs ingestion IT operational metrics dashboards Devices / Sensor Operational Intelligence Digital Ad Tech./ Marketing Advertising Data aggregation Advertising metrics like coverage, yield, conversion Analytics on User engagement with Ads, Optimized bid/ buy engines Financial Services Market/ Financial Transaction order data collection Financial market data metrics Fraud monitoring, and Value-at-Risk assessment, Auditing of market order data Consumer Online/ E-Commerce Online customer engagement data aggregation Consumer engagement metrics like page views, CTR Customer clickstream analytics, Recommendation engines Customer Scenarios across Industry Segments 1 2 3
  • 13. What Biz. Problem needs to be solved? Mobile/ Social Gaming Digital Advertising Tech. Deliver continuous/ real-time delivery of game insight data by 100’s of game servers Generate real-time metrics, KPIs for online ad performance for advertisers/ publishers Custom-built solutions operationally complex to manage, & not scalable Store + Forward fleet of log servers, and Hadoop based processing pipeline • Delay with critical business data delivery • Developer burden in building reliable, scalable platform for real-time data ingestion/ processing • Slow-down of real-time customer insights • Lost data with Store/ Forward layer • Operational burden in managing reliable, scalable platform for real-time data ingestion/ processing • Batch-driven real-time customer insights Accelerate time to market of elastic, real-time applications – while minimizing operational overhead Generate freshest analytics on advertiser performance to optimize marketing spend, and increase responsiveness to clients
  • 14. Solution Architecture Set o Streaming Data Ingestion  Kafka  Flume  Kestrel / Scribe  RabbitMQ / AMQP o Streaming Data Processing  Storm o Do-It-yourself (AWS) based solution  EC2: Logging/ pass through servers  EBS: holds log/ other data snapshots  SQS: Queue data store  S3: Persistence store  EMR: workflow to ingest data from S3 and process o Exploring Continual data Ingestion & Processing ‘Typical’ Technology Solution Set Solution Architecture Considerations Flexibility: Select the most appropriate software, and configure underlying infrastructure Control: Software and hardware can be tuned to meet specific business and scenario needs. Ongoing Operational Complexity: Deploy, and manage an end-to-end system Infrastructure planning and maintenance: Managing a reliable, scalable infrastructure Developer/ IT staff expense: Developers, Devops and IT staff time and energy expended Software Maintenance : Tech. and professional services support
  • 15. Foundation for Data Streams Ingestion, Continuous Processing Right Toolset for the Right Job Real-time Ingest • Highly Scalable • Durable • Elastic • Replay-able Reads Continuous Processing FX • Load-balancing incoming streams • Fault-tolerance, Checkpoint / Replay • Elastic • Enable multiple apps to process in parallel Enable data movement into Stores/ Processing Engines Managed Service Low end-to-end latency Continuous, real-time workloads
  • 16. Kinesis Architecture Amazon Web Services AZ AZ AZ Durable, highly consistent storage replicates data across three data centers (availability zones) Aggregate and archive to S3 Millions of sources producing 100s of terabytes per hour Front End Authentication Authorization Ordered stream of events supports multiple readers Real-time dashboards and alarms Machine learning algorithms or sliding window analytics Aggregate analysis in Hadoop or a data warehouse Inexpensive: $0.028 per million puts
  • 17. Amazon Kinesis – An Overview
  • 18. Kinesis Stream: Managed ability to capture and store data • Streams are made of Shards • Each Shard ingests data up to 1MB/sec, and up to 1000 TPS • Each Shard emits up to 2 MB/sec • All data is stored for 24 hours • Scale Kinesis streams by adding or removing Shards • Replay data inside of 24Hr. Window
  • 19. Putting Data into Kinesis Simple Put interface to store data in Kinesis • Producers use a PUT call to store data in a Stream • PutRecord {Data, PartitionKey, StreamName} • A Partition Key is supplied by producer and used to distribute the PUTs across Shards • Kinesis MD5 hashes supplied partition key over the hash key range of a Shard • A unique Sequence # is returned to the Producer upon a successful PUT call
  • 20. Creating and Sizing a Kinesis Stream
  • 21. Getting Started with Kinesis – Writing to a Stream POST / HTTP/1.1 Host: kinesis.<region>.<domain> x-amz-Date: <Date> Authorization: AWS4-HMAC-SHA256 Credential=<Credential>, SignedHeaders=content- type;date;host;user-agent;x-amz-date;x-amz-target;x-amzn-requestid, Signature=<Signature> User-Agent: <UserAgentString> Content-Type: application/x-amz-json-1.1 Content-Length: <PayloadSizeBytes> Connection: Keep-Alive X-Amz-Target: Kinesis_20131202.PutRecord { "StreamName": "exampleStreamName", "Data": "XzxkYXRhPl8x", "PartitionKey": "partitionKey" }
  • 22. Sending & Reading Data from Kinesis Streams HTTP Post AWS SDK LOG4J Flume Fluentd Get* APIs Kinesis Client Library + Connector Library Apache Storm Amazon Elastic MapReduce Sending Reading
  • 23. Building Kinesis Processing Apps: Kinesis Client Library Client library for fault-tolerant, at least-once, Continuous Processing o Java client library, source available on Github o Build & Deploy app with KCL on your EC2 instance(s) o KCL is intermediary b/w your application & stream  Automatically starts a Kinesis Worker for each shard  Simplifies reading by abstracting individual shards  Increase / Decrease Workers as # of shards changes  Checkpoints to keep track of a Worker’s location in the stream, Restarts Workers if they fail o Integrates with AutoScaling groups to redistribute workers to new instances
  • 24. Processing Data with Kinesis : Sample RecordProcessor public class SampleRecordProcessor implements IRecordProcessor { @Override public void initialize(String shardId) { LOG.info("Initializing record processor for shard: " + shardId); this.kinesisShardId = shardId; } @Override public void processRecords(List<Record> records, IRecordProcessorCheckpointer checkpointer) { LOG.info("Processing " + records.size() + " records for kinesisShardId " + kinesisShardId); // Process records and perform all exception handling. processRecordsWithRetries(records); // Checkpoint once every checkpoint interval. if (System.currentTimeMillis() > nextCheckpointTimeInMillis) { checkpoint(checkpointer); nextCheckpointTimeInMillis = System.currentTimeMillis() + CHECKPOINT_INTERVAL_MILLIS; } } }
  • 25. Processing Data with Kinesis : Sample Worker IRecordProcessorFactory recordProcessorFactory = new SampleRecordProcessorFactory(); Worker worker = new Worker(recordProcessorFactory, kinesisClientLibConfiguration); int exitCode = 0; try { worker.run(); } catch (Throwable t) { LOG.error("Caught throwable while processing data.", t); exitCode = 1; }
  • 26. Amazon Kinesis Connector Library Customizable, Open Source code to Connect Kinesis with S3, Redshift, DynamoDB S3 DynamoDB Redshift Kinesis ITransformer • Defines the transformation of records from the Amazon Kinesis stream in order to suit the user- defined data model IFilter • Excludes irrelevant records from the processing. IBuffer • Buffers the set of records to be processed by specifying size limit (# of records)& total byte count IEmitter • Makes client calls to other AWS services and persists the records stored in the buffer.
  • 27. More Options to read from Kinesis Streams Leveraging Get APIs, existing Storm topologies o Use the Get APIs for raw reads of Kinesis data streams • GetRecords {Limit, ShardIterator} • GetShardIterator {ShardId, ShardIteratorType, StartingSequenceNumber, StreamName} o Integrate Kinesis Streams with Storm Topologies • Bootstraps, via Zookeeper to map Shards to Spout tasks • Fetches data from Kinesis stream • Emits tuples and Checkpoints (in Zookeeper)
  • 28. Using EMR to read, and process data from Kinesis Streams Processing Input • User • Dev My Website Kinesis Log4J Appender push to Kinesis EMR – AMI 3.0.5 Hive Pig Cascading MapReduce pull from
  • 29. Hadoop ecosystem Implementation & Features • Logical names –Labels that define units of work (Job A vs Job B) • Checkpoints – Creating an input start and end points to allow batch processing • Error Handling –Service errors –Retries • Iterations – Provide idempotency (pessimistic locking of the Logical name) Hadoop Input format Hive Storage Handler Pig Load Function Cascading Scheme and Tap
  • 30. Intended use • Unlock the power of Hadoop on fresh data – Join multiple data sources for analysis – Filter and preprocess streams – Export and archive streaming data
  • 31. Customers using Amazon Kinesis Mobile/ Social Gaming Digital Advertising Tech. Deliver continuous/ real-time delivery of game insight data by 100’s of game servers Generate real-time metrics, KPIs for online ad performance for advertisers/ publishers Custom-built solutions operationally complex to manage, & not scalable Store + Forward fleet of log servers, and Hadoop based processing pipeline • Delay with critical business data delivery • Developer burden in building reliable, scalable platform for real-time data ingestion/ processing • Slow-down of real-time customer insights • Lost data with Store/ Forward layer • Operational burden in managing reliable, scalable platform for real-time data ingestion/ processing • Batch-driven real-time customer insights Accelerate time to market of elastic, real-time applications – while minimizing operational overhead Generate freshest analytics on advertiser performance to optimize marketing spend, and increase responsiveness to clients
  • 32. Under NDA Gaming Analytics with Amazon Kinesis
  • 33. Digital Ad. Tech Metering with Kinesis Continuous Ad Metrics Extraction Incremental Ad. Statistics Computation Metering Record Archive Ad Analytics Dashboard
  • 34. Kinesis Pricing Simple, Pay-as-you-go, & no up-front costs Pricing Dimension Value Hourly Shard Rate $0.015 Per 1,000,000 PUT transactions: $0.028 • Customers specify throughput requirements in shards, that they control • Each Shard delivers 1 MB/s on ingest, and 2MB/s on egress • Inbound data transfer is free • EC2 instance charges apply for Kinesis processing applications
  • 35. 38 Easy Administration Managed service for real-time streaming data collection, processing and analysis. Simply create a new stream, set the desired level of capacity, and let the service handle the rest. Real-time Performance Perform continual processing on streaming big data. Processing latencies fall to a few seconds, compared with the minutes or hours associated with batch processing. High Throughput. Elastic Seamlessly scale to match your data throughput rate and volume. You can easily scale up to gigabytes per second. The service will scale up or down based on your operational or business needs. S3, Redshift, & DynamoDB Integration Reliably collect, process, and transform all of your data in real-time & deliver to AWS data stores of choice, with Connectors for S3, Redshift, and DynamoDB. Build Real-time Applications Client libraries that enable developers to design and operate real-time streaming data processing applications. Low Cost Cost-efficient for workloads of any scale. You can get started by provisioning a small stream, and pay low hourly rates only for what you use. Amazon Kinesis: Key Developer Benefits
  • 36. Try out Amazon Kinesis • Try out Amazon Kinesis – http://aws.amazon.com/kinesis/ • Thumb through the Developer Guide – http://aws.amazon.com/documentation/kinesis/ • Visit, and Post on Kinesis Forum – https://forums.aws.amazon.com/forum.jspa?forumID=169#
  • 37. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Introducing Amazon Kinesis Managed Service for Streaming Data Ingestion, & Processing Ryan Waite, GM AWS Data Services Adi Krishnan, Sr. PM, Amazon March 26, 2014 Thank you!