SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Kinesis & Big Data
Meld real-time streaming with EMR (Hadoop), &
Redshift (Data Warehousing)
Adi Krishnan, AWS Product Management, @adityak
Daniel Mintz, Director of BI, Upworthy, @danielmintz
July 10, 2014
Amazon Kinesis & Big Data
o Motivations for Stream Processing
 Origins: Internal metering capability
 Expanding the big data processing landscape
o Customer view on streaming data
o Amazon Kinesis Overview
 Amazon Kinesis Architecture
 Kinesis concepts & Demo
o Amazon Elastic MapReduce and Kinesis
 EMR connector morphs Kinesis streamed data into Hadoop framework
 Applying Hadoop frameworks to streaming data
o Amazon Kinesis and Redshift:
 Upworthy presents “Shrinking Redshift data load times from 24 hours to 10 minutes”
 Presented by Daniel Mintz, Director of Business Intelligence, Upworthy
The Motivation for Continuous Processing
Origins: Internal AWS Metering Capability
Workload
• 10s of millions records/sec
• Multiple TB per hour
• 100,000s of sources
Pain points
• Doesn’t scale elastically
• Customers want real-time alerts
• Expensive to operate
• Relies on eventually consistent
storage
Expanding the Big Data Processing Landscape
• Query Engine Approach
• Pre-computations such as
indices and dimensional views
improve performance
• Historical, structured data
• HIVE/SQL-on-Hadoop/ M-R/
Spark
• Batch programs, or other
abstractions breaking down
into MR style computations
• Historical, Semi-structured
data
• Custom computations of
relative simple complexity
• Continuous Processing –
filters, sliding windows,
aggregates – on infinite data
streams
• Semi/Structured data,
generated continuously in
real-time
Traditional Data Warehousing Hadoop Style Processing Stream Processing
A Generalized Data Flow
Many different technologies, at different stages of evolution
Client/Sensor Aggregator Continuous
Processing
Storage Analytics +
Reporting
Our Big Data Transition
Old Posture
• Capture huge amounts of data
and process it in hourly or daily
batches
New Requirements
• Make decisions faster,
sometimes in real-time
• Scale entire system elastically
• Make it easy to “keep
everything”
• Multiple applications can
process data in parallel
Foundation for Data Streams Ingestion, Continuous Processing
Right Toolset for the Right Job
Real-time Ingest
• Highly Scalable
• Durable
• Elastic
• Replay-able Reads
Continuous Processing FX
• Load-balancing incoming streams
• Fault-tolerance, Checkpoint / Replay
• Elastic
• Enable multiple apps to process in parallel
Enable data movement into Stores/ Processing Engines
Managed Service
Low end-to-end latency
Continuous, real-time workloads
Customer View
Scenarios Accelerated Ingest-Transform-Load Continual Metrics/ KPI Extraction Responsive Data Analysis
Data Types IT infrastructure, Applications logs, Social media, Fin. Market data, Web Clickstreams, Sensors, Geo/Location data
Software/
Technology
IT server , App logs ingestion IT operational metrics dashboards Devices / Sensor Operational
Intelligence
Digital Ad Tech./
Marketing
Advertising Data aggregation Advertising metrics like coverage, yield,
conversion
Analytics on User engagement with
Ads, Optimized bid/ buy engines
Financial Services Market/ Financial Transaction order data
collection
Financial market data metrics Fraud monitoring, and Value-at-Risk
assessment, Auditing of market order
data
Consumer Online/
E-Commerce
Online customer engagement data
aggregation
Consumer engagement metrics like
page views, CTR
Customer clickstream analytics,
Recommendation engines
Customer Scenarios across Industry Segments
1 2 3
Big streaming data comes from the small
{
"payerId": "Joe",
"productCode": "AmazonS3",
"clientProductCode": "AmazonS3",
"usageType": "Bandwidth",
"operation": "PUT",
"value": "22490",
"timestamp": "1216674828"
}
Metering Record
127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700]
"GET /apache_pb.gif HTTP/1.0" 200 2326
Common Log Entry
<165>1 2003-10-11T22:14:15.003Z
mymachine.example.com evntslog - ID47
[exampleSDID@32473 iut="3"
eventSource="Application"
eventID="1011"][examplePriority@32473
class="high"]
Syslog Entry
“SeattlePublicWater/Kinesis/123/Realtime”
– 412309129140
MQTT Record <R,AMZN ,T,G,R1>
NASDAQ OMX Record
What Biz. Problem needs to be solved?
Mobile/ Social Gaming Digital Advertising Tech.
Deliver continuous/ real-time delivery of game
insight data by 100’s of game servers
Generate real-time metrics, KPIs for online ad
performance for advertisers/ publishers
Custom-built solutions operationally complex to
manage, & not scalable
Store + Forward fleet of log servers, and Hadoop based
processing pipeline
• Delay with critical business data delivery
• Developer burden in building reliable, scalable
platform for real-time data ingestion/ processing
• Slow-down of real-time customer insights
• Lost data with Store/ Forward layer
• Operational burden in managing reliable, scalable
platform for real-time data ingestion/ processing
• Batch-driven real-time customer insights
Accelerate time to market of elastic, real-time
applications – while minimizing operational
overhead
Generate freshest analytics on advertiser performance
to optimize marketing spend, and increase
responsiveness to clients
Amazon Kinesis
Managed Service for streaming data ingestion, and processing
Amazon Kinesis Architecture
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates data
across three data centers (availability zones)
Aggregate and
archive to S3
Millions of
sources producing
100s of terabytes
per hour
Front
End
Authentication
Authorization
Ordered stream
of events supports
multiple readers
Real-time
dashboards
and alarms
Machine learning
algorithms or
sliding window
analytics
Aggregate analysis
in Hadoop or a
data warehouse
Inexpensive: $0.028 per million puts
Kinesis Stream:
Managed ability to capture and store data
• Streams are made of Shards
• Each Shard ingests data up to
1MB/sec, and up to 1000 TPS
• Each Shard emits up to 2 MB/sec
• All data is stored for 24 hours
• Scale Kinesis streams by splitting
or merging Shards
• Replay data inside of 24Hr.
Window
Putting Data into Kinesis
Simple Put interface to store data in Kinesis
• Producers use a PUT call to store data in a Stream
• PutRecord {Data, PartitionKey, StreamName}
• A Partition Key is supplied by producer and used to
distribute the PUTs across Shards
• Kinesis MD5 hashes supplied partition key over the
hash key range of a Shard
• A unique Sequence # is returned to the Producer
upon a successful PUT call
Building Kinesis Processing Apps: Kinesis Client Library
Open Source library for fault-tolerant, continuous processing apps
• Java client library, source available on Github
• Build app with KCL on your EC2 instance(s)
• KCL is intermediary b/w your application & stream
• Automatically starts a Kinesis Worker for each shard
• Simplifies reading by abstracting individual shards
• Increase / Decrease Workers as # of shards changes
• Checkpoints to keep track of a Worker’s location in
the stream, Restarts Workers if they fail
• Deploy app on your EC2 instances
• Integrates with AutoScaling groups to redistribute workers
to new instances
Amazon Kinesis Connector Library
Open Source code to Connect Kinesis with S3, Redshift, DynamoDB
S3
DynamoDB
Redshift
Kinesis
ITransformer
• Defines the
transformation
of records
from the
Amazon
Kinesis stream
in order to suit
the user-
defined data
model
IFilter
• Excludes
irrelevant
records from
the
processing.
IBuffer
• Buffers the set
of records to
be processed
by specifying
size limit (# of
records)& total
byte count
IEmitter
• Makes client
calls to other
AWS services
and persists
the records
stored in the
buffer.
Sending & Reading Data from Kinesis Streams
HTTP Post
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Kinesis Client
Library
+
Connector Library
Apache
Storm
Amazon Elastic
MapReduce
Sending Consuming
AWS Mobile
SDK
Amazon Kinesis & Elastic MapReduce
Amazon Elastic MapReduce (EMR)
Managed Service for Hadoop based data processing
• Managed service
• Easy to tune clusters and trim
costs
• Support for multiple data stores
• Unique features that ensure
customer success on AWS
Applying batch processing to streamed data
Client/ Sensor Recording
Service
Aggregator/
Sequencer
Continuous
processor for
dashboard
Storage
Analytics and
Reporting
Amazon Kinesis Amazon EMR
Streaming Data Ingestion
What would this look like?
Processing
Input
• User
• Dev
My Website
Kinesis
Log4J
Appender
push to
Kinesis
EMR
Hive
Pig
Cascading
MapReduce
pull from
• Features offered starting EMR AMI 3.0.4
– Simply spin up the EMR cluster like normal
• Logical names
– Labels that define units of work (Job A vs Job B)
• Iterations
– Provide idempotency (pessimistic locking of the Logical name)
• Checkpoints
– Creating an input start and end points to allow batch processing
Features and Functionality
Iterations – the run of a Job
Iteration 1 Iteration 2 Iteration 3 Iteration 4
Trim Horizon seqID
1:00 – 7:00 7:00 – 13:00 13:00 – 19:00 19:00 – 1:00
-24 hours
Logical Name
Stream
NOW
Latest seqID
Next
Logical Names & Checkpointing – allows
efficient batching
Kinesis Stream
NOW
Latest seqIDTrim Horizon seqID
-24 hours
Logical Name
Stream
• Dynamo DB
Metadata Storage
Logical Name A
Mapper 1
Mapper 2
Mapper 3
Mapper 4
Logical Name B
Mapper 1
Mapper 2
Mapper 3
Mapper 4
Each Kinesis shard maps 1:1 to a Hadoop
map task
1:00 – 7:00 7:00 – 13:00 13:00 – 19:00 19:00 – 1:00
Mapper 2
Kinesis
Hadoop
Next
Logical Name
Mapper 1
Shard 2
Shard 1
Mapper 2
Mapper 1
Mapper 2
Mapper 1
Mapper 2
Mapper 1
-24 hours
Start seq ID End seq ID
NOW
Latest seqID
Handling stream scaling events
Trim Horizon seqID
1:00 – 7:00 7:00 – 13:00 13:00 – 19:00 19:00 – 1:00
Mapper 2
Kinesis
Hadoop
Logical Name
Mapper 1
Shard 2
Shard 1
Mapper 2
Mapper 1
Mapper 2
Mapper 1
3
Mapper 3
1
2
4
5
Split
Shard 2
Shard 1
Shard 2
Shard 1
Shard 3
Shard 2
Shard 1
Shard 3
Split Merge
-24 hours
Latest seqID
NOW
Next
• InputFormat handles service errors
– Throttling: 400
– Service unavailable errors : 503
– Internal server 500
– Http Client exceptions : socket connection timeout
• Hadoop handles retry of failed map tasks
• Iterations allow retrys
– Fixed input boundaries on a stream (idempotency for reruns)
– Enable multiple queries on the same input boundaries
Handling errors
Hadoop Ecosystem Implementation
• Hadoop Input format
• Hive Storage Handler
• Pig Load Function
• Cascading Scheme and
Tap
• Join multiple data
sources for analysis
• Filter and preprocess
streams
• Export and archive
streaming data
Use CasesImplementations
Writing to Kinesis using Log4J
Option Default Description
log4j.appender.KINESIS.streamName AccessLog
Stream
Stream name to which data is to be published.
log4j.appender.KINESIS.encoding UTF-8 Encoding used to convert log message strings into
bytes before sending to Amazon Kinesis.
log4j.appender.KINESIS.maxRetries 3 Maximum number of retries when calling Kinesis APIs
to publish a log message.
log4j.appender.KINESIS.backoffInterval 100ms Milliseconds to wait before a retry attempt.
log4j.appender.KINESIS.threadCount 20 Number of parallel threads for publishing logs to
configured Kinesis stream.
log4j.appender.KINESIS.bufferSize 2000 Maximum number of outstanding log messages to
keep in memory.
log4j.appender.KINESIS.shutdownTimeout 30 Seconds to send buffered messages before application
JVM quits normally.
.error("Cannot find resource XYX… go do something about it!");
Run the Ad-hoc Hive Query
Run the Ad-hoc Hive Query
Amazon Kinesis & Redshift
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
24 Hours to 10 Minutes
How Upworthy’s Data Pipeline uses Kinesis
Daniel Mintz, Director Business Intelligence, Upworthy, @danielmintz
What’s Upworthy
• We’ve been called
– “Social media with a mission” by our About Page
– “The fastest growing media site of all time” by Fast Company
– “The Fastest Rising Startup” by The Crunchies
– “That thing that’s all over my newsfeed” by my annoyed friends
– “The most data-driven media company in history” by me,
optimistically
What We Do
• We aim to drive
massive amounts of
attention to things that
really matter.
• We do that by finding,
packaging, and
distributing great,
meaningful content.
Our Use Case
When We Started
• Had built a data warehouse from scratch
• Hadoop-based batch workflow
• Nightly ETL cycle
• 2.5 Engineers
• Wanted to do all three:
– Comprehensive
– Ad Hoc
– Real-Time
The Decision
• Speed up our current system, rather than
building a parallel one
• Had looked at alternative stream processors
– Cost
– Maintenance
• Comfortable with concept of application log
stream
How It Works
• Log Drain receives, formats, batches and zips
• PUTs 50k GZIP batches on Kinesis stream
• Three types of Kinesis consumers:
1. Archiver – Batch and write permanent record
2. Stats – Filter, sample and count; Report to StatHat
3. Transformer – Filter, batch, validate; writes temporary BSVs to S3
• Database Importer handles manifest files.
• S3 handles garbage collection.
Our system now
• Stats:
– Average: ~1085 events/second
– Peak: ~2500 events/second
• Data is available in Redshift < 10 min
• Kinesis has been cheap, stable, and gives us
redundancy and resiliency.
• Computation model that’s easy to reason about
Resiliency
• When something goes
wrong, you have 24
hours.
• Timestamp at outset.
Track lag at each step.
• Bigger workers (more
CPU, RAM, deeper
queues) can catch us
up very fast.
What We’ve Learned
Some Lessons
• You can use one pipeline for everything.
• High-cardinality fact data belongs in Kinesis.
• EDN works well with Kinesis.
• We prefer explicit checkpointing. (Your mileage may
vary.)
• Languages that run on the JVM can take advantage of
AWS Client Libraries.
Kinesis Pricing
Simple, Pay-as-you-go, & no up-front costs
Pricing Dimension Value
Hourly Shard Rate $0.015
Per 1,000,000 PUT
transactions:
$0.028
• Customers specify throughput requirements in shards, that they control
• Each Shard delivers 1 MB/s on ingest, and 2MB/s on egress
• Inbound data transfer is free
• EC2 instance charges apply for Kinesis processing applications
Canonical Data flows with Amazon Kinesis
Continuous Metric
Extraction
Incremental Stats
Computation
Record Archiving
Live Dashboard
Try out Amazon Kinesis
• Try out Amazon Kinesis
– http://aws.amazon.com/kinesis/
• Thumb through the Developer Guide
– http://aws.amazon.com/documentation/kinesis/
• Test drive the sample app
– https://github.com/awslabs/amazon-kinesis-data-visualization-sample
• Kinesis Connector Framework
– https://github.com/awslabs/amazon-kinesis-connectors
• Read EMR-Kinesis FAQs
– http://aws.amazon.com/elasticmapreduce/faqs/#kinesis-connector
• Visit, and Post on Kinesis Forum
– https://forums.aws.amazon.com/forum.jspa?forumID=169#
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Thank You!
Adi Krishnan, Product Management, AWS

Contenu connexe

Tendances

효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...Amazon Web Services Korea
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Amazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Aurora Deep Dive (김기완) - AWS DB DayAmazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Aurora Deep Dive (김기완) - AWS DB DayAmazon Web Services Korea
 
Migrating Your Databases to AWS - Tools and Services.pdf
Migrating Your Databases to AWS -  Tools and Services.pdfMigrating Your Databases to AWS -  Tools and Services.pdf
Migrating Your Databases to AWS - Tools and Services.pdfAmazon Web Services
 
SplunkLive! Splunk for Security
SplunkLive! Splunk for SecuritySplunkLive! Splunk for Security
SplunkLive! Splunk for SecuritySplunk
 
Amazon Cloud | Amazon Cloud Computing Tutorial | AWS Tutorial | AWS Training ...
Amazon Cloud | Amazon Cloud Computing Tutorial | AWS Tutorial | AWS Training ...Amazon Cloud | Amazon Cloud Computing Tutorial | AWS Tutorial | AWS Training ...
Amazon Cloud | Amazon Cloud Computing Tutorial | AWS Tutorial | AWS Training ...Edureka!
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureDataWorks Summit
 
AWS에서 Kubernetes 실행하기 - 황경태 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
AWS에서 Kubernetes 실행하기 - 황경태 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019AWS에서 Kubernetes 실행하기 - 황경태 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
AWS에서 Kubernetes 실행하기 - 황경태 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019Amazon Web Services Korea
 
Cassandra serving netflix @ scale
Cassandra serving netflix @ scaleCassandra serving netflix @ scale
Cassandra serving netflix @ scaleVinay Kumar Chella
 
Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsAmazon Web Services
 
Kuberntes Ingress with Kong
Kuberntes Ingress with KongKuberntes Ingress with Kong
Kuberntes Ingress with KongNebulaworks
 
Apache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and ArchitecturesApache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and ArchitecturesKai Wähner
 
[MeetUp][3rd] 아무도 이야기하지 않는 클라우드 3사 솔직 비교
[MeetUp][3rd] 아무도 이야기하지 않는 클라우드 3사 솔직 비교[MeetUp][3rd] 아무도 이야기하지 않는 클라우드 3사 솔직 비교
[MeetUp][3rd] 아무도 이야기하지 않는 클라우드 3사 솔직 비교InfraEngineer
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingAraf Karsh Hamid
 
What is Cloud Cost Optimization and Management? How It Works?
What is Cloud Cost Optimization and Management? How It Works?What is Cloud Cost Optimization and Management? How It Works?
What is Cloud Cost Optimization and Management? How It Works?Veritis Group, Inc
 
Bringing API Management to AWS Powered Backends
Bringing API Management to AWS Powered BackendsBringing API Management to AWS Powered Backends
Bringing API Management to AWS Powered BackendsApigee | Google Cloud
 
Cloud Native Application
Cloud Native ApplicationCloud Native Application
Cloud Native ApplicationVMUG IT
 
Azure databases for PostgreSQL, MySQL and MariaDB
Azure databases for PostgreSQL, MySQL and MariaDB Azure databases for PostgreSQL, MySQL and MariaDB
Azure databases for PostgreSQL, MySQL and MariaDB rockplace
 

Tendances (20)

효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Amazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Aurora Deep Dive (김기완) - AWS DB DayAmazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Aurora Deep Dive (김기완) - AWS DB Day
 
Migrating Your Databases to AWS - Tools and Services.pdf
Migrating Your Databases to AWS -  Tools and Services.pdfMigrating Your Databases to AWS -  Tools and Services.pdf
Migrating Your Databases to AWS - Tools and Services.pdf
 
SplunkLive! Splunk for Security
SplunkLive! Splunk for SecuritySplunkLive! Splunk for Security
SplunkLive! Splunk for Security
 
Amazon Cloud | Amazon Cloud Computing Tutorial | AWS Tutorial | AWS Training ...
Amazon Cloud | Amazon Cloud Computing Tutorial | AWS Tutorial | AWS Training ...Amazon Cloud | Amazon Cloud Computing Tutorial | AWS Tutorial | AWS Training ...
Amazon Cloud | Amazon Cloud Computing Tutorial | AWS Tutorial | AWS Training ...
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
 
AWS에서 Kubernetes 실행하기 - 황경태 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
AWS에서 Kubernetes 실행하기 - 황경태 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019AWS에서 Kubernetes 실행하기 - 황경태 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
AWS에서 Kubernetes 실행하기 - 황경태 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
 
Cassandra serving netflix @ scale
Cassandra serving netflix @ scaleCassandra serving netflix @ scale
Cassandra serving netflix @ scale
 
Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your Applications
 
Kuberntes Ingress with Kong
Kuberntes Ingress with KongKuberntes Ingress with Kong
Kuberntes Ingress with Kong
 
Apache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and ArchitecturesApache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and Architectures
 
[MeetUp][3rd] 아무도 이야기하지 않는 클라우드 3사 솔직 비교
[MeetUp][3rd] 아무도 이야기하지 않는 클라우드 3사 솔직 비교[MeetUp][3rd] 아무도 이야기하지 않는 클라우드 3사 솔직 비교
[MeetUp][3rd] 아무도 이야기하지 않는 클라우드 3사 솔직 비교
 
DevOps on AWS
DevOps on AWSDevOps on AWS
DevOps on AWS
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
 
What is Cloud Cost Optimization and Management? How It Works?
What is Cloud Cost Optimization and Management? How It Works?What is Cloud Cost Optimization and Management? How It Works?
What is Cloud Cost Optimization and Management? How It Works?
 
Bringing API Management to AWS Powered Backends
Bringing API Management to AWS Powered BackendsBringing API Management to AWS Powered Backends
Bringing API Management to AWS Powered Backends
 
Cloud Native Application
Cloud Native ApplicationCloud Native Application
Cloud Native Application
 
Azure databases for PostgreSQL, MySQL and MariaDB
Azure databases for PostgreSQL, MySQL and MariaDB Azure databases for PostgreSQL, MySQL and MariaDB
Azure databases for PostgreSQL, MySQL and MariaDB
 

En vedette

Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Amazon Web Services
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysisAmazon Web Services
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisAmazon Web Services
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Chris Fregly
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveYifeng Jiang
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsAmazon Web Services
 
Collecting and analyzing sensor data with hadoop or other no sql databases
Collecting and analyzing sensor data with hadoop or other no sql databasesCollecting and analyzing sensor data with hadoop or other no sql databases
Collecting and analyzing sensor data with hadoop or other no sql databasesMatteo Redaelli
 
AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...
AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...
AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...Amazon Web Services
 
Going Serverless with CQRS on AWS
Going Serverless with CQRS on AWSGoing Serverless with CQRS on AWS
Going Serverless with CQRS on AWSAnton Udovychenko
 
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Web Services
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studyViet-Trung TRAN
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleSajjad Zaheer
 
Workshop AWS IoT @ IoT World Paris
Workshop AWS IoT @ IoT World ParisWorkshop AWS IoT @ IoT World Paris
Workshop AWS IoT @ IoT World ParisJulien SIMON
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisAmazon Web Services
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Alexey Kharlamov
 
Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda
Real-time Data Processing with Amazon DynamoDB Streams and AWS LambdaReal-time Data Processing with Amazon DynamoDB Streams and AWS Lambda
Real-time Data Processing with Amazon DynamoDB Streams and AWS LambdaAmazon Web Services
 
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...Amazon Web Services
 

En vedette (20)

Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
Amazon EMR
Amazon EMRAmazon EMR
Amazon EMR
 
Collecting and analyzing sensor data with hadoop or other no sql databases
Collecting and analyzing sensor data with hadoop or other no sql databasesCollecting and analyzing sensor data with hadoop or other no sql databases
Collecting and analyzing sensor data with hadoop or other no sql databases
 
AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...
AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...
AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...
 
Going Serverless with CQRS on AWS
Going Serverless with CQRS on AWSGoing Serverless with CQRS on AWS
Going Serverless with CQRS on AWS
 
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with Example
 
Workshop AWS IoT @ IoT World Paris
Workshop AWS IoT @ IoT World ParisWorkshop AWS IoT @ IoT World Paris
Workshop AWS IoT @ IoT World Paris
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
 
Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda
Real-time Data Processing with Amazon DynamoDB Streams and AWS LambdaReal-time Data Processing with Amazon DynamoDB Streams and AWS Lambda
Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda
 
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
 

Similaire à Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapReduce

AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAmazon Web Services
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)Amazon Web Services Korea
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAmazon Web Services
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Web Services
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeSingleStore
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon KinesisAmazon Web Services
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesisJampp
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAmazon Web Services
 

Similaire à Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapReduce (20)

What's new in AWS?
What's new in AWS?What's new in AWS?
What's new in AWS?
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
 
Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
ABD217_From Batch to Streaming
ABD217_From Batch to StreamingABD217_From Batch to Streaming
ABD217_From Batch to Streaming
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Dernier

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Dernier (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapReduce

  • 1. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Amazon Kinesis & Big Data Meld real-time streaming with EMR (Hadoop), & Redshift (Data Warehousing) Adi Krishnan, AWS Product Management, @adityak Daniel Mintz, Director of BI, Upworthy, @danielmintz July 10, 2014
  • 2. Amazon Kinesis & Big Data o Motivations for Stream Processing  Origins: Internal metering capability  Expanding the big data processing landscape o Customer view on streaming data o Amazon Kinesis Overview  Amazon Kinesis Architecture  Kinesis concepts & Demo o Amazon Elastic MapReduce and Kinesis  EMR connector morphs Kinesis streamed data into Hadoop framework  Applying Hadoop frameworks to streaming data o Amazon Kinesis and Redshift:  Upworthy presents “Shrinking Redshift data load times from 24 hours to 10 minutes”  Presented by Daniel Mintz, Director of Business Intelligence, Upworthy
  • 3. The Motivation for Continuous Processing
  • 4. Origins: Internal AWS Metering Capability Workload • 10s of millions records/sec • Multiple TB per hour • 100,000s of sources Pain points • Doesn’t scale elastically • Customers want real-time alerts • Expensive to operate • Relies on eventually consistent storage
  • 5. Expanding the Big Data Processing Landscape • Query Engine Approach • Pre-computations such as indices and dimensional views improve performance • Historical, structured data • HIVE/SQL-on-Hadoop/ M-R/ Spark • Batch programs, or other abstractions breaking down into MR style computations • Historical, Semi-structured data • Custom computations of relative simple complexity • Continuous Processing – filters, sliding windows, aggregates – on infinite data streams • Semi/Structured data, generated continuously in real-time Traditional Data Warehousing Hadoop Style Processing Stream Processing
  • 6. A Generalized Data Flow Many different technologies, at different stages of evolution Client/Sensor Aggregator Continuous Processing Storage Analytics + Reporting
  • 7. Our Big Data Transition Old Posture • Capture huge amounts of data and process it in hourly or daily batches New Requirements • Make decisions faster, sometimes in real-time • Scale entire system elastically • Make it easy to “keep everything” • Multiple applications can process data in parallel
  • 8. Foundation for Data Streams Ingestion, Continuous Processing Right Toolset for the Right Job Real-time Ingest • Highly Scalable • Durable • Elastic • Replay-able Reads Continuous Processing FX • Load-balancing incoming streams • Fault-tolerance, Checkpoint / Replay • Elastic • Enable multiple apps to process in parallel Enable data movement into Stores/ Processing Engines Managed Service Low end-to-end latency Continuous, real-time workloads
  • 10. Scenarios Accelerated Ingest-Transform-Load Continual Metrics/ KPI Extraction Responsive Data Analysis Data Types IT infrastructure, Applications logs, Social media, Fin. Market data, Web Clickstreams, Sensors, Geo/Location data Software/ Technology IT server , App logs ingestion IT operational metrics dashboards Devices / Sensor Operational Intelligence Digital Ad Tech./ Marketing Advertising Data aggregation Advertising metrics like coverage, yield, conversion Analytics on User engagement with Ads, Optimized bid/ buy engines Financial Services Market/ Financial Transaction order data collection Financial market data metrics Fraud monitoring, and Value-at-Risk assessment, Auditing of market order data Consumer Online/ E-Commerce Online customer engagement data aggregation Consumer engagement metrics like page views, CTR Customer clickstream analytics, Recommendation engines Customer Scenarios across Industry Segments 1 2 3
  • 11. Big streaming data comes from the small { "payerId": "Joe", "productCode": "AmazonS3", "clientProductCode": "AmazonS3", "usageType": "Bandwidth", "operation": "PUT", "value": "22490", "timestamp": "1216674828" } Metering Record 127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 Common Log Entry <165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"][examplePriority@32473 class="high"] Syslog Entry “SeattlePublicWater/Kinesis/123/Realtime” – 412309129140 MQTT Record <R,AMZN ,T,G,R1> NASDAQ OMX Record
  • 12. What Biz. Problem needs to be solved? Mobile/ Social Gaming Digital Advertising Tech. Deliver continuous/ real-time delivery of game insight data by 100’s of game servers Generate real-time metrics, KPIs for online ad performance for advertisers/ publishers Custom-built solutions operationally complex to manage, & not scalable Store + Forward fleet of log servers, and Hadoop based processing pipeline • Delay with critical business data delivery • Developer burden in building reliable, scalable platform for real-time data ingestion/ processing • Slow-down of real-time customer insights • Lost data with Store/ Forward layer • Operational burden in managing reliable, scalable platform for real-time data ingestion/ processing • Batch-driven real-time customer insights Accelerate time to market of elastic, real-time applications – while minimizing operational overhead Generate freshest analytics on advertiser performance to optimize marketing spend, and increase responsiveness to clients
  • 13. Amazon Kinesis Managed Service for streaming data ingestion, and processing
  • 14. Amazon Kinesis Architecture Amazon Web Services AZ AZ AZ Durable, highly consistent storage replicates data across three data centers (availability zones) Aggregate and archive to S3 Millions of sources producing 100s of terabytes per hour Front End Authentication Authorization Ordered stream of events supports multiple readers Real-time dashboards and alarms Machine learning algorithms or sliding window analytics Aggregate analysis in Hadoop or a data warehouse Inexpensive: $0.028 per million puts
  • 15. Kinesis Stream: Managed ability to capture and store data • Streams are made of Shards • Each Shard ingests data up to 1MB/sec, and up to 1000 TPS • Each Shard emits up to 2 MB/sec • All data is stored for 24 hours • Scale Kinesis streams by splitting or merging Shards • Replay data inside of 24Hr. Window
  • 16. Putting Data into Kinesis Simple Put interface to store data in Kinesis • Producers use a PUT call to store data in a Stream • PutRecord {Data, PartitionKey, StreamName} • A Partition Key is supplied by producer and used to distribute the PUTs across Shards • Kinesis MD5 hashes supplied partition key over the hash key range of a Shard • A unique Sequence # is returned to the Producer upon a successful PUT call
  • 17. Building Kinesis Processing Apps: Kinesis Client Library Open Source library for fault-tolerant, continuous processing apps • Java client library, source available on Github • Build app with KCL on your EC2 instance(s) • KCL is intermediary b/w your application & stream • Automatically starts a Kinesis Worker for each shard • Simplifies reading by abstracting individual shards • Increase / Decrease Workers as # of shards changes • Checkpoints to keep track of a Worker’s location in the stream, Restarts Workers if they fail • Deploy app on your EC2 instances • Integrates with AutoScaling groups to redistribute workers to new instances
  • 18. Amazon Kinesis Connector Library Open Source code to Connect Kinesis with S3, Redshift, DynamoDB S3 DynamoDB Redshift Kinesis ITransformer • Defines the transformation of records from the Amazon Kinesis stream in order to suit the user- defined data model IFilter • Excludes irrelevant records from the processing. IBuffer • Buffers the set of records to be processed by specifying size limit (# of records)& total byte count IEmitter • Makes client calls to other AWS services and persists the records stored in the buffer.
  • 19. Sending & Reading Data from Kinesis Streams HTTP Post AWS SDK LOG4J Flume Fluentd Get* APIs Kinesis Client Library + Connector Library Apache Storm Amazon Elastic MapReduce Sending Consuming AWS Mobile SDK
  • 20. Amazon Kinesis & Elastic MapReduce
  • 21. Amazon Elastic MapReduce (EMR) Managed Service for Hadoop based data processing • Managed service • Easy to tune clusters and trim costs • Support for multiple data stores • Unique features that ensure customer success on AWS
  • 22. Applying batch processing to streamed data Client/ Sensor Recording Service Aggregator/ Sequencer Continuous processor for dashboard Storage Analytics and Reporting Amazon Kinesis Amazon EMR Streaming Data Ingestion
  • 23. What would this look like? Processing Input • User • Dev My Website Kinesis Log4J Appender push to Kinesis EMR Hive Pig Cascading MapReduce pull from
  • 24. • Features offered starting EMR AMI 3.0.4 – Simply spin up the EMR cluster like normal • Logical names – Labels that define units of work (Job A vs Job B) • Iterations – Provide idempotency (pessimistic locking of the Logical name) • Checkpoints – Creating an input start and end points to allow batch processing Features and Functionality
  • 25. Iterations – the run of a Job Iteration 1 Iteration 2 Iteration 3 Iteration 4 Trim Horizon seqID 1:00 – 7:00 7:00 – 13:00 13:00 – 19:00 19:00 – 1:00 -24 hours Logical Name Stream NOW Latest seqID Next
  • 26. Logical Names & Checkpointing – allows efficient batching Kinesis Stream NOW Latest seqIDTrim Horizon seqID -24 hours Logical Name Stream
  • 27. • Dynamo DB Metadata Storage Logical Name A Mapper 1 Mapper 2 Mapper 3 Mapper 4 Logical Name B Mapper 1 Mapper 2 Mapper 3 Mapper 4
  • 28. Each Kinesis shard maps 1:1 to a Hadoop map task 1:00 – 7:00 7:00 – 13:00 13:00 – 19:00 19:00 – 1:00 Mapper 2 Kinesis Hadoop Next Logical Name Mapper 1 Shard 2 Shard 1 Mapper 2 Mapper 1 Mapper 2 Mapper 1 Mapper 2 Mapper 1 -24 hours Start seq ID End seq ID NOW Latest seqID
  • 29. Handling stream scaling events Trim Horizon seqID 1:00 – 7:00 7:00 – 13:00 13:00 – 19:00 19:00 – 1:00 Mapper 2 Kinesis Hadoop Logical Name Mapper 1 Shard 2 Shard 1 Mapper 2 Mapper 1 Mapper 2 Mapper 1 3 Mapper 3 1 2 4 5 Split Shard 2 Shard 1 Shard 2 Shard 1 Shard 3 Shard 2 Shard 1 Shard 3 Split Merge -24 hours Latest seqID NOW Next
  • 30. • InputFormat handles service errors – Throttling: 400 – Service unavailable errors : 503 – Internal server 500 – Http Client exceptions : socket connection timeout • Hadoop handles retry of failed map tasks • Iterations allow retrys – Fixed input boundaries on a stream (idempotency for reruns) – Enable multiple queries on the same input boundaries Handling errors
  • 31. Hadoop Ecosystem Implementation • Hadoop Input format • Hive Storage Handler • Pig Load Function • Cascading Scheme and Tap • Join multiple data sources for analysis • Filter and preprocess streams • Export and archive streaming data Use CasesImplementations
  • 32. Writing to Kinesis using Log4J Option Default Description log4j.appender.KINESIS.streamName AccessLog Stream Stream name to which data is to be published. log4j.appender.KINESIS.encoding UTF-8 Encoding used to convert log message strings into bytes before sending to Amazon Kinesis. log4j.appender.KINESIS.maxRetries 3 Maximum number of retries when calling Kinesis APIs to publish a log message. log4j.appender.KINESIS.backoffInterval 100ms Milliseconds to wait before a retry attempt. log4j.appender.KINESIS.threadCount 20 Number of parallel threads for publishing logs to configured Kinesis stream. log4j.appender.KINESIS.bufferSize 2000 Maximum number of outstanding log messages to keep in memory. log4j.appender.KINESIS.shutdownTimeout 30 Seconds to send buffered messages before application JVM quits normally. .error("Cannot find resource XYX… go do something about it!");
  • 33. Run the Ad-hoc Hive Query
  • 34. Run the Ad-hoc Hive Query
  • 35. Amazon Kinesis & Redshift
  • 36. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. 24 Hours to 10 Minutes How Upworthy’s Data Pipeline uses Kinesis Daniel Mintz, Director Business Intelligence, Upworthy, @danielmintz
  • 37. What’s Upworthy • We’ve been called – “Social media with a mission” by our About Page – “The fastest growing media site of all time” by Fast Company – “The Fastest Rising Startup” by The Crunchies – “That thing that’s all over my newsfeed” by my annoyed friends – “The most data-driven media company in history” by me, optimistically
  • 38. What We Do • We aim to drive massive amounts of attention to things that really matter. • We do that by finding, packaging, and distributing great, meaningful content.
  • 40. When We Started • Had built a data warehouse from scratch • Hadoop-based batch workflow • Nightly ETL cycle • 2.5 Engineers • Wanted to do all three: – Comprehensive – Ad Hoc – Real-Time
  • 41. The Decision • Speed up our current system, rather than building a parallel one • Had looked at alternative stream processors – Cost – Maintenance • Comfortable with concept of application log stream
  • 42. How It Works • Log Drain receives, formats, batches and zips • PUTs 50k GZIP batches on Kinesis stream • Three types of Kinesis consumers: 1. Archiver – Batch and write permanent record 2. Stats – Filter, sample and count; Report to StatHat 3. Transformer – Filter, batch, validate; writes temporary BSVs to S3 • Database Importer handles manifest files. • S3 handles garbage collection.
  • 43. Our system now • Stats: – Average: ~1085 events/second – Peak: ~2500 events/second • Data is available in Redshift < 10 min • Kinesis has been cheap, stable, and gives us redundancy and resiliency. • Computation model that’s easy to reason about
  • 44. Resiliency • When something goes wrong, you have 24 hours. • Timestamp at outset. Track lag at each step. • Bigger workers (more CPU, RAM, deeper queues) can catch us up very fast.
  • 46. Some Lessons • You can use one pipeline for everything. • High-cardinality fact data belongs in Kinesis. • EDN works well with Kinesis. • We prefer explicit checkpointing. (Your mileage may vary.) • Languages that run on the JVM can take advantage of AWS Client Libraries.
  • 47. Kinesis Pricing Simple, Pay-as-you-go, & no up-front costs Pricing Dimension Value Hourly Shard Rate $0.015 Per 1,000,000 PUT transactions: $0.028 • Customers specify throughput requirements in shards, that they control • Each Shard delivers 1 MB/s on ingest, and 2MB/s on egress • Inbound data transfer is free • EC2 instance charges apply for Kinesis processing applications
  • 48. Canonical Data flows with Amazon Kinesis Continuous Metric Extraction Incremental Stats Computation Record Archiving Live Dashboard
  • 49. Try out Amazon Kinesis • Try out Amazon Kinesis – http://aws.amazon.com/kinesis/ • Thumb through the Developer Guide – http://aws.amazon.com/documentation/kinesis/ • Test drive the sample app – https://github.com/awslabs/amazon-kinesis-data-visualization-sample • Kinesis Connector Framework – https://github.com/awslabs/amazon-kinesis-connectors • Read EMR-Kinesis FAQs – http://aws.amazon.com/elasticmapreduce/faqs/#kinesis-connector • Visit, and Post on Kinesis Forum – https://forums.aws.amazon.com/forum.jspa?forumID=169#
  • 50. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Thank You! Adi Krishnan, Product Management, AWS