SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Radhika Ravirala, Solutions Architect, Amazon Web Services
10/26/2016
Building a Streaming Data Platform on AWS
Activities 1-3
Activity 1
Create and Populate a Kinesis
Stream
Activity 1: Create and populate a Kinesis Stream
We are going to:
A. Create an Amazon Kinesis stream
B. Create an AWS IAM user for writing to the stream
C. Write data to a Kinesis Stream
Activity 1-A: Create an Amazon Kinesis stream
1. Go to the Amazon Kinesis Streams Console
1. https://us-west-2.console.aws.amazon.com/kinesis/home
2. Click “Create Stream”
3. Specify a stream name and number of shards (2), click
“Create”
Activity 1-B: Create an AWS IAM user
Create a AWS IAM user and attach policy
1. Go to the AWS IAM console, click “Users”
1. https://console.aws.amazon.com/iam/home?region=us-west-2#users
2. Specify user name (i.e “kinesis_producer”), create, and
download credentials
3. Go to “Users” and click the user name you just created
4. Click “Permissions” tab and then “Attach Policy”
5. Search “Kinesis” and attach the AWS Managed Policy
“AmazonKinesisFullAccess” to the user.
Activity 1-C: Write data to a Kinesis Stream (pt 1)
To simplify this, we are going to use simple tool for creating sample data and
writing it to a Kinesis Stream.
1. Go to the simple data producer UI http://bit.ly/kinesis-producer
2. Copy and past your credentials into the form. The credentials do not leave
your client (locally hosted site). Additionally, the AWS Javascript SDK is
implemented in the site which uses HTTPS. In normal situations, you do not
want to use your credentials in this way unless you trust the provider.
3. Specify region (us-west-2) and stream name.
4. Specify a data rate between 100 – 500, pick a unique number. This will
control the rate at which records are sent to the stream.
Activity 1-C: Write data to a Kinesis Stream (pt 2)
5. Copy and paste this into the text editor
{
"sensorId": "sensor-{{random.number(100)}}",
"eventTimeStamp": "{{current.timestamp}}",
"ambientTemp": {{random.number({"min":-40, "max":60})}},
"measuredTemp": {{random.number({"min":0, "max":100})}}
}
The tool will generate JSON documents based upon the above “record format”.
The tool generates hundreds of these records and send them in a single put
using PutRecords. Finally, the tool will drain your battery. Only keep it running
while we are completing the activities if you don’t have a power cord.
Review – Kinesis Shards and Streams
You created a two shard stream in this activity.
• How much throughput does that shard support?
• How do we determine which shard each record is saved?
• What is a good logical partition key?
Review – Monitoring a Stream
Go to the Amazon CloudWatch Metrics Console and search
“IncomingRecords”. Select this metric for your Kinesis Stream and choose a 1
Minute SUM.
How do you determine which records are going to which shards?
Activity 2
Process Data using Kinesis
Analytics
Activity 2: Process Data using Kinesis Analytics
We are going to:
A. Create a Kinesis Analytics applications that reads from
the Kinesis stream with IoT Sensor data
B. Add your first query
C. Add a continuous filter
D. Calculate an aggregate metric for an interesting statistic
on the incoming data
E. Stop the application
Activity 2-A: Create Kinesis Analytics Application
1. Go to the Amazon Kinesis Analytics Console
1. https://us-west-2.console.aws.amazon.com/kinesisanalytics/home?region=us-
west-2
2. Click “Create new application” and provide a name
3. Click “Connect to a source” and select the stream you created in the
previous exercise
4. Kinesis Analytics will discover the schema for you. Verify the schema and
click “Save and Continue”
5. Click “Go To SQL Editor” and then “Run Application”
You now have a running stream processing application!
Writing SQL over Streaming Data
Writing SQL over streaming data using Kinesis Analytics follows a two part
model:
1. Create an in-application stream for storing intermediate SQL results. An in-
application stream is like a SQL table, but is continuously updated.
2. Create a PUMP which will continuously read FROM one in-application
stream and INSERT INTO a target in-application stream
CLEAN_IOT_STREAM
Part 1
FILTERED_STREAM
Part 2
AGGREGATE_STREAM
Part 3
SOURCE_SQL_STREAM_001
DEDUP_PUMP
AGGREGATE_PUMP
FILTER_PUMP
Activity 2-B: Add your first query (part 1)
Our first query will use SELECT DISTINCT to remove any producer-side
duplicates. Producer-side duplicates are caused by connectivity issues and
producer retires. In the SQL Editor, create a target in-application stream to store
the intermediate SQL results:
CREATE OR REPLACE STREAM CLEAN_IOT_STREAM (
ingest_time TIMESTAMP,
sensor_id VARCHAR(16),
event_timestamp VARCHAR(16),
ambient_temp DECIMAL(16,2),
measured_temp DECIMAL(16,2),
dedup_window TIMESTAMP);
Activity 2-B: Add your first query (part 2)
Perform a SELECT DISTCINT over a 1 second window, inserting into your in-
application stream. Pay attention to lower and upper case when selecting
columns (use quotes).
CREATE OR REPLACE PUMP DEDUP_PUMP AS
INSERT INTO CLEAN_IOT_STREAM
SELECT STREAM DISTINCT APPROXIMATE_ARRIVAL_TIME,
"sensorId", "eventTimeStamp", "ambientTemp",
"measuredTemp",
FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO SECOND)
FROM SOURCE_SQL_STREAM_001;
Activity 2-B: Add your first query (part 3)
This query removes any producer side duplicates over a one second window.
(In a production setting, this window would likely be larger like 10 seconds).
Producer side duplicates are most often created when an HTTP request times
out due to connectivity issues on the producer.
The window is defined bye the following statement in the SELECT statement.
Note that the ROWTIME column is implicitly included in every stream query,
and represents the processing time of the application.
FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO SECOND)
Activity 2-C: Add a continuous filter (part 1)
A continuous filter is a common query that uses a WHERE clause to select a
portion of your data stream. Create a target in-application stream for the filtered
data.
CREATE OR REPLACE STREAM FILTERED_STREAM (
ingest_time TIMESTAMP,
sensor_id VARCHAR(16),
event_timestamp VARCHAR(16),
ambient_temp DECIMAL(16,2),
measured_temp DECIMAL(16,2),
dedup_window TIMESTAMP);
Activity 2-C: Add a continuous filter (part 2)
Create a pump that INSERTs into your FILTERED_STREAM, selects FROM
your CLEAN_IOT_STREAM, and uses a WHERE clause on the difference
between ambient and measured temperature.
CREATE OR REPLACE PUMP FILTER_PUMP AS
INSERT INTO FILTERED_STREAM
SELECT STREAM ingest_time, sensor_id, event_timestamp,
ambient_temp, measured_temp, dedup_window
FROM CLEAN_IOT_STREAM
WHERE abs(measured_temp - ambient_temp) > 20;
Activity 2-D: Calculate an aggregate metric (pt 1)
The last part of this activity is free form. Calculate a count using a tumbling
window and a GROUP BY clause.
A tumbling window is similar to a periodic report, where you specify your query
and a time range, and results are emitted at the end of a range. (EX: COUNT
number of items by key for 10 seconds)
Activity 2-D: Calculate an aggregate metric (pt 2)
CREATE OR REPLACE STREAM AGGREGATE_STREAM (sensor_id
VARCHAR(16), avg_ambient_temp DECIMAL(16,2),
avg_measured_temp DECIMAL(16,2), measurements INTEGER);
CREATE OR REPLACE PUMP AGGREGATE_PUMP AS INSERT INTO
AGGREGATE_STREAM
SELECT STREAM sensor_id, AVG(ambient_temp),
AVG(measured_temp), COUNT(sensor_id)
FROM CLEAN_IOT_STREAM
GROUP BY sensor_id, FLOOR((CLEAN_IOT_STREAM.ROWTIME -
TIMESTAMP '1970-01-01 00:00:00') SECOND / 10 TO SECOND);
Different types of windows
Tumbling
Sliding
Custom
Tumbling Windows
Tumbling windows are useful for periodic reports. You can use a tumbling window to compute
an average number of visitors to your website in the last 5 minutes, or the maximum over the
past hour. A single result is emitted for each key in the group as specified by the clause at the
end of the defined window.
An important characteristic of a tumbling window is that the bounds do not overlap; the start of a
new tumbling window begins with the end of the old window. The below image visually captures
how a 5-minute tumbling window would aggregate results, in separate windows that are not
overlapping. The blue, green, and red windows represent minutes 0 to 5, 5 to 10, and 10 to 15,
respectively.
Tumbling Windows in Group By
Tumbling windows are always included in a GROUP BY
clause and use the FLOOR function. The FLOOR function
takes the lowest possible integer for a given input. To create
a tumbling time window, convert a timestamp to an integer
representation and put it in a FLOOR function to create the
window bound. Three examples are shown below:
Activity 2-E: Stop your application
1. Go to the Amazon Kinesis Analytics main dashboard.
2. Select your application.
3. Click “Actions” and then “Stop Application”.
Review – In-Application SQL streams
Your application has multiple in-application SQL streams including
CLEAN_IOT_STREAM, FILTERED_STREAM, and AGGREGATE_STREAM.
These in-application streams which are like SQL tables that are continuously
updated.
What else is unique about an in-application stream aside from its continuous
nature?
Activity 3
Deliver streaming data to S3
Activity 3: Deliver Data to S3 using Firehose
We are going to:
A. Create a Firehose delivery stream
B. Update Kinesis Analytics app
C. Check the S3 bucket for results
D. Clean up
Activity 3-A: Create Firehose delivery stream
1. Go to the Amazon Kinesis Firehose dashboard, click “Create Delivery
Stream”.
1. https://console.aws.amazon.com/firehose/home?region=us-west-2#/intro
2. Use a different name then the one you used for your Kinesis stream.
2. Select Amazon S3 for your destination, create a new bucket in Oregon, and
name it (i.e. <yourname>-filtered-iot-data). Click “Next”
3. Choose Configuration options
1. Choose a buffer size of 5 MB, and 60 seconds
2. Choose No compression or encryption
3. Turn on Error Logging
4. Select “Create new IAM Role” and then “Allow”
5. Click “Next”
4. Review and click “Create Delivery Stream”
Activity 3-B: Update Kinesis Analytics app (part 1)
1. Go to the Kinesis Analytics Console, select your
application
2. Go to the SQL Editor, click “No, I’ll do this later”
3. Identify the name of the in-application stream you used
for the filtered results (i.e. FILTERED_STREAM).
4. Click “Exit (done editing)”.
5. Click “Connect to a destination”.
Activity 3-B: Update Kinesis Analytics app (part 2)
5. Configure your destination
1. Choose the Firehose delivery stream you just created
2. Edit the “.. refer to this stream as” to the in-
application stream containing the filtered results.
3. Choose CSV as the “Output format”
4. Click “Save and continue”
6. Go back to the Kinesis Analytics dashboard, select your
application, and click “Run Application”
Activity 3-C: Check the S3 bucket for results
1. Go to the Amazon S3 console
1. https://console.aws.amazon.com/s3/home?region=us-west-
2#
2. Select the bucket you created in Activity 3-C (i.e. filtered-
iot-data
3. Navigate down the folder hierarchy
4. Select an object and click “Action”, “Download”
5. Open up the file with a text editor
Review – Reliable delivery to a destination
You are successfully delivering data to S3.
What are possible issues that would prevent Firehose from
delivering data?
If something went wrong, where would you look?
Activity 3-D: Clean up
1. Stop the data producer
2. Go to the Kinesis Analytics console, delete your
application
3. Go to the Kinesis Streams console, delete your stream
4. Go to the Kinesis Firehose console, delete your deliver
stream
5. Go to the Amazon S3 console, right-click on your bucket
and choose “Delete Bucket”
Some Next Steps
We have many AWS Big Data Blogs which cover more examples. Full list here. Some good
ones:
1. Kinesis Streams
1. Implement Efficient and Reliable Producers with the Amazon Kinesis Producer Library
2. Presto and Amazon Kinesis
3. Querying Amazon Kinesis Streams Directly with SQL and Sparking Streaming
4. Optimize Spark-Streaming to Efficiently Process Amazon Kinesis Streams
2. Kinesis Firehose
1. Persist Streaming Data to Amazon S3 using Amazon Kinesis Firehose and AWS Lambda
2. Building a Near Real-Time Discovery Platform with AWS
3. Kinesis Analytics
1. Writing SQL on Streaming Data With Amazon Kinesis Analytics Part 1 | Part 2
2. Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics
• Technical documentations
• Amazon Kinesis Agent
• Amazon Kinesis Streams and Spark Streaming
• Amazon Kinesis Producer Library Best Practice
• Amazon Kinesis Firehose and AWS Lambda
• Building Near Real-Time Discovery Platform with Amazon Kinesis
• Public case studies
• Glu mobile – Real-Time Analytics
• Hearst Publishing – Clickstream Analytics
• How Sonos Leverages Amazon Kinesis
• Nordstorm Online Stylist
Reference

Contenu connexe

Tendances

Tendances (20)

ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
 
Getting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #WinningGetting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #Winning
 
Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
 
Announcing Lambda @ the Edge - December 2016 Monthly Webinar Series
Announcing Lambda @ the Edge - December 2016 Monthly Webinar SeriesAnnouncing Lambda @ the Edge - December 2016 Monthly Webinar Series
Announcing Lambda @ the Edge - December 2016 Monthly Webinar Series
 
AWS October Webinar Series - Introducing Amazon Kinesis Firehose
AWS October Webinar Series - Introducing Amazon Kinesis FirehoseAWS October Webinar Series - Introducing Amazon Kinesis Firehose
AWS October Webinar Series - Introducing Amazon Kinesis Firehose
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
 
Getting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudGetting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless Cloud
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
 
Modern data architectures for real time analytics and engagement
Modern data architectures for real time analytics and engagementModern data architectures for real time analytics and engagement
Modern data architectures for real time analytics and engagement
 
Getting Started with AWS Database Migration Service
Getting Started with AWS Database Migration ServiceGetting Started with AWS Database Migration Service
Getting Started with AWS Database Migration Service
 
Build a Website on AWS for Your First 10 Million Users
Build a Website on AWS for Your First 10 Million UsersBuild a Website on AWS for Your First 10 Million Users
Build a Website on AWS for Your First 10 Million Users
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
 
DevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless CloudDevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
 
Almacenamiento en la nube con AWS
Almacenamiento en la nube con AWSAlmacenamiento en la nube con AWS
Almacenamiento en la nube con AWS
 
BDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practicesBDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practices
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 
(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads
 
HSBC and AWS Day - Microservices and Serverless
HSBC and AWS Day - Microservices and ServerlessHSBC and AWS Day - Microservices and Serverless
HSBC and AWS Day - Microservices and Serverless
 

En vedette

I Love APIs 2015 API Lab Design-first API Development Using Node and Swagger
I Love APIs 2015 API Lab Design-first API Development Using Node and SwaggerI Love APIs 2015 API Lab Design-first API Development Using Node and Swagger
I Love APIs 2015 API Lab Design-first API Development Using Node and Swagger
Apigee | Google Cloud
 

En vedette (20)

(DEV301) Automating AWS with the AWS CLI
(DEV301) Automating AWS with the AWS CLI(DEV301) Automating AWS with the AWS CLI
(DEV301) Automating AWS with the AWS CLI
 
(DVO304) AWS CloudFormation Best Practices
(DVO304) AWS CloudFormation Best Practices(DVO304) AWS CloudFormation Best Practices
(DVO304) AWS CloudFormation Best Practices
 
Building serverless apps with Node.js
Building serverless apps with Node.jsBuilding serverless apps with Node.js
Building serverless apps with Node.js
 
#ALSummit: Architecting Security into your AWS Environment
#ALSummit: Architecting Security into your AWS Environment#ALSummit: Architecting Security into your AWS Environment
#ALSummit: Architecting Security into your AWS Environment
 
Building APIs with Node.js and Swagger
Building APIs with Node.js and SwaggerBuilding APIs with Node.js and Swagger
Building APIs with Node.js and Swagger
 
Getting started with Amazon Dynamo BD
Getting started with Amazon Dynamo BDGetting started with Amazon Dynamo BD
Getting started with Amazon Dynamo BD
 
Part 2 / 4: How to Intelligently Process and Deliver Real-Time Data with FME ...
Part 2 / 4: How to Intelligently Process and Deliver Real-Time Data with FME ...Part 2 / 4: How to Intelligently Process and Deliver Real-Time Data with FME ...
Part 2 / 4: How to Intelligently Process and Deliver Real-Time Data with FME ...
 
A Tour of Swagger for APIs
A Tour of Swagger for APIsA Tour of Swagger for APIs
A Tour of Swagger for APIs
 
Java 8 Lambda Expressions
Java 8 Lambda ExpressionsJava 8 Lambda Expressions
Java 8 Lambda Expressions
 
AWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSightAWS October Webinar Series - Introducing Amazon QuickSight
AWS October Webinar Series - Introducing Amazon QuickSight
 
Fowa Miami 09 Cloud Computing Workshop
Fowa Miami 09 Cloud Computing WorkshopFowa Miami 09 Cloud Computing Workshop
Fowa Miami 09 Cloud Computing Workshop
 
Docker on AWS
Docker on AWSDocker on AWS
Docker on AWS
 
I Love APIs 2015 API Lab Design-first API Development Using Node and Swagger
I Love APIs 2015 API Lab Design-first API Development Using Node and SwaggerI Love APIs 2015 API Lab Design-first API Development Using Node and Swagger
I Love APIs 2015 API Lab Design-first API Development Using Node and Swagger
 
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
 
An overview of Amazon Athena
An overview of Amazon AthenaAn overview of Amazon Athena
An overview of Amazon Athena
 
Java 8 Support at the JVM Level
Java 8 Support at the JVM LevelJava 8 Support at the JVM Level
Java 8 Support at the JVM Level
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Java 8 for Tablets, Pis, and Legos
Java 8 for Tablets, Pis, and LegosJava 8 for Tablets, Pis, and Legos
Java 8 for Tablets, Pis, and Legos
 
AWS re:Invent 2016: The Effective AWS CLI User (DEV402)
AWS re:Invent 2016: The Effective AWS CLI User (DEV402)AWS re:Invent 2016: The Effective AWS CLI User (DEV402)
AWS re:Invent 2016: The Effective AWS CLI User (DEV402)
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 

Similaire à Workshop: Building a Streaming Data Platform on AWS

CASE STUDY InternetExcel Exercises, page 434, textRecord your.docx
CASE STUDY InternetExcel Exercises, page 434, textRecord your.docxCASE STUDY InternetExcel Exercises, page 434, textRecord your.docx
CASE STUDY InternetExcel Exercises, page 434, textRecord your.docx
keturahhazelhurst
 
Documentation
DocumentationDocumentation
Documentation
Kalyan A
 
Guidelhhghghine document final
Guidelhhghghine document finalGuidelhhghghine document final
Guidelhhghghine document final
nanirao686
 
Introduction to the .NET Access Control Service
Introduction to the .NET Access Control ServiceIntroduction to the .NET Access Control Service
Introduction to the .NET Access Control Service
butest
 
Introduction to the .NET Access Control Service
Introduction to the .NET Access Control ServiceIntroduction to the .NET Access Control Service
Introduction to the .NET Access Control Service
butest
 
The present and future of Serverless observability
The present and future of Serverless observabilityThe present and future of Serverless observability
The present and future of Serverless observability
Yan Cui
 

Similaire à Workshop: Building a Streaming Data Platform on AWS (20)

Micro services from scratch - Part 1
Micro services from scratch - Part 1Micro services from scratch - Part 1
Micro services from scratch - Part 1
 
CASE STUDY InternetExcel Exercises, page 434, textRecord your.docx
CASE STUDY InternetExcel Exercises, page 434, textRecord your.docxCASE STUDY InternetExcel Exercises, page 434, textRecord your.docx
CASE STUDY InternetExcel Exercises, page 434, textRecord your.docx
 
Building a Streaming Data Platform on AWS - Workshop
Building a Streaming Data Platform on AWS - WorkshopBuilding a Streaming Data Platform on AWS - Workshop
Building a Streaming Data Platform on AWS - Workshop
 
Documentation
DocumentationDocumentation
Documentation
 
Building Streaming Applications with Streaming SQL
Building Streaming Applications with Streaming SQLBuilding Streaming Applications with Streaming SQL
Building Streaming Applications with Streaming SQL
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
The ActionScript Conference 08, Singapore - Developing ActionScript 3 Mash up...
The ActionScript Conference 08, Singapore - Developing ActionScript 3 Mash up...The ActionScript Conference 08, Singapore - Developing ActionScript 3 Mash up...
The ActionScript Conference 08, Singapore - Developing ActionScript 3 Mash up...
 
Tips and Tricks for new async web client capabilities on model driven apps
Tips and Tricks for new async web client capabilities on model driven appsTips and Tricks for new async web client capabilities on model driven apps
Tips and Tricks for new async web client capabilities on model driven apps
 
Guidelhhghghine document final
Guidelhhghghine document finalGuidelhhghghine document final
Guidelhhghghine document final
 
MongoDB.local DC 2018: Scaling Realtime Apps with Change Streams
MongoDB.local DC 2018: Scaling Realtime Apps with Change StreamsMongoDB.local DC 2018: Scaling Realtime Apps with Change Streams
MongoDB.local DC 2018: Scaling Realtime Apps with Change Streams
 
Ajaxppt
AjaxpptAjaxppt
Ajaxppt
 
Introduction to the .NET Access Control Service
Introduction to the .NET Access Control ServiceIntroduction to the .NET Access Control Service
Introduction to the .NET Access Control Service
 
Introduction to the .NET Access Control Service
Introduction to the .NET Access Control ServiceIntroduction to the .NET Access Control Service
Introduction to the .NET Access Control Service
 
ASP DOT NET
ASP DOT NETASP DOT NET
ASP DOT NET
 
Getting Started with Real-Time Analytics
Getting Started with Real-Time AnalyticsGetting Started with Real-Time Analytics
Getting Started with Real-Time Analytics
 
The present and future of serverless observability (QCon London)
The present and future of serverless observability (QCon London)The present and future of serverless observability (QCon London)
The present and future of serverless observability (QCon London)
 
The present and future of Serverless observability
The present and future of Serverless observabilityThe present and future of Serverless observability
The present and future of Serverless observability
 
The present and future of Serverless observability
The present and future of Serverless observabilityThe present and future of Serverless observability
The present and future of Serverless observability
 
Monitoring as Software Validation
Monitoring as Software ValidationMonitoring as Software Validation
Monitoring as Software Validation
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Workshop: Building a Streaming Data Platform on AWS

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Radhika Ravirala, Solutions Architect, Amazon Web Services 10/26/2016 Building a Streaming Data Platform on AWS Activities 1-3
  • 2. Activity 1 Create and Populate a Kinesis Stream
  • 3. Activity 1: Create and populate a Kinesis Stream We are going to: A. Create an Amazon Kinesis stream B. Create an AWS IAM user for writing to the stream C. Write data to a Kinesis Stream
  • 4. Activity 1-A: Create an Amazon Kinesis stream 1. Go to the Amazon Kinesis Streams Console 1. https://us-west-2.console.aws.amazon.com/kinesis/home 2. Click “Create Stream” 3. Specify a stream name and number of shards (2), click “Create”
  • 5. Activity 1-B: Create an AWS IAM user Create a AWS IAM user and attach policy 1. Go to the AWS IAM console, click “Users” 1. https://console.aws.amazon.com/iam/home?region=us-west-2#users 2. Specify user name (i.e “kinesis_producer”), create, and download credentials 3. Go to “Users” and click the user name you just created 4. Click “Permissions” tab and then “Attach Policy” 5. Search “Kinesis” and attach the AWS Managed Policy “AmazonKinesisFullAccess” to the user.
  • 6. Activity 1-C: Write data to a Kinesis Stream (pt 1) To simplify this, we are going to use simple tool for creating sample data and writing it to a Kinesis Stream. 1. Go to the simple data producer UI http://bit.ly/kinesis-producer 2. Copy and past your credentials into the form. The credentials do not leave your client (locally hosted site). Additionally, the AWS Javascript SDK is implemented in the site which uses HTTPS. In normal situations, you do not want to use your credentials in this way unless you trust the provider. 3. Specify region (us-west-2) and stream name. 4. Specify a data rate between 100 – 500, pick a unique number. This will control the rate at which records are sent to the stream.
  • 7. Activity 1-C: Write data to a Kinesis Stream (pt 2) 5. Copy and paste this into the text editor { "sensorId": "sensor-{{random.number(100)}}", "eventTimeStamp": "{{current.timestamp}}", "ambientTemp": {{random.number({"min":-40, "max":60})}}, "measuredTemp": {{random.number({"min":0, "max":100})}} } The tool will generate JSON documents based upon the above “record format”. The tool generates hundreds of these records and send them in a single put using PutRecords. Finally, the tool will drain your battery. Only keep it running while we are completing the activities if you don’t have a power cord.
  • 8. Review – Kinesis Shards and Streams You created a two shard stream in this activity. • How much throughput does that shard support? • How do we determine which shard each record is saved? • What is a good logical partition key?
  • 9. Review – Monitoring a Stream Go to the Amazon CloudWatch Metrics Console and search “IncomingRecords”. Select this metric for your Kinesis Stream and choose a 1 Minute SUM. How do you determine which records are going to which shards?
  • 10. Activity 2 Process Data using Kinesis Analytics
  • 11. Activity 2: Process Data using Kinesis Analytics We are going to: A. Create a Kinesis Analytics applications that reads from the Kinesis stream with IoT Sensor data B. Add your first query C. Add a continuous filter D. Calculate an aggregate metric for an interesting statistic on the incoming data E. Stop the application
  • 12. Activity 2-A: Create Kinesis Analytics Application 1. Go to the Amazon Kinesis Analytics Console 1. https://us-west-2.console.aws.amazon.com/kinesisanalytics/home?region=us- west-2 2. Click “Create new application” and provide a name 3. Click “Connect to a source” and select the stream you created in the previous exercise 4. Kinesis Analytics will discover the schema for you. Verify the schema and click “Save and Continue” 5. Click “Go To SQL Editor” and then “Run Application” You now have a running stream processing application!
  • 13. Writing SQL over Streaming Data Writing SQL over streaming data using Kinesis Analytics follows a two part model: 1. Create an in-application stream for storing intermediate SQL results. An in- application stream is like a SQL table, but is continuously updated. 2. Create a PUMP which will continuously read FROM one in-application stream and INSERT INTO a target in-application stream CLEAN_IOT_STREAM Part 1 FILTERED_STREAM Part 2 AGGREGATE_STREAM Part 3 SOURCE_SQL_STREAM_001 DEDUP_PUMP AGGREGATE_PUMP FILTER_PUMP
  • 14. Activity 2-B: Add your first query (part 1) Our first query will use SELECT DISTINCT to remove any producer-side duplicates. Producer-side duplicates are caused by connectivity issues and producer retires. In the SQL Editor, create a target in-application stream to store the intermediate SQL results: CREATE OR REPLACE STREAM CLEAN_IOT_STREAM ( ingest_time TIMESTAMP, sensor_id VARCHAR(16), event_timestamp VARCHAR(16), ambient_temp DECIMAL(16,2), measured_temp DECIMAL(16,2), dedup_window TIMESTAMP);
  • 15. Activity 2-B: Add your first query (part 2) Perform a SELECT DISTCINT over a 1 second window, inserting into your in- application stream. Pay attention to lower and upper case when selecting columns (use quotes). CREATE OR REPLACE PUMP DEDUP_PUMP AS INSERT INTO CLEAN_IOT_STREAM SELECT STREAM DISTINCT APPROXIMATE_ARRIVAL_TIME, "sensorId", "eventTimeStamp", "ambientTemp", "measuredTemp", FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO SECOND) FROM SOURCE_SQL_STREAM_001;
  • 16. Activity 2-B: Add your first query (part 3) This query removes any producer side duplicates over a one second window. (In a production setting, this window would likely be larger like 10 seconds). Producer side duplicates are most often created when an HTTP request times out due to connectivity issues on the producer. The window is defined bye the following statement in the SELECT statement. Note that the ROWTIME column is implicitly included in every stream query, and represents the processing time of the application. FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO SECOND)
  • 17. Activity 2-C: Add a continuous filter (part 1) A continuous filter is a common query that uses a WHERE clause to select a portion of your data stream. Create a target in-application stream for the filtered data. CREATE OR REPLACE STREAM FILTERED_STREAM ( ingest_time TIMESTAMP, sensor_id VARCHAR(16), event_timestamp VARCHAR(16), ambient_temp DECIMAL(16,2), measured_temp DECIMAL(16,2), dedup_window TIMESTAMP);
  • 18. Activity 2-C: Add a continuous filter (part 2) Create a pump that INSERTs into your FILTERED_STREAM, selects FROM your CLEAN_IOT_STREAM, and uses a WHERE clause on the difference between ambient and measured temperature. CREATE OR REPLACE PUMP FILTER_PUMP AS INSERT INTO FILTERED_STREAM SELECT STREAM ingest_time, sensor_id, event_timestamp, ambient_temp, measured_temp, dedup_window FROM CLEAN_IOT_STREAM WHERE abs(measured_temp - ambient_temp) > 20;
  • 19. Activity 2-D: Calculate an aggregate metric (pt 1) The last part of this activity is free form. Calculate a count using a tumbling window and a GROUP BY clause. A tumbling window is similar to a periodic report, where you specify your query and a time range, and results are emitted at the end of a range. (EX: COUNT number of items by key for 10 seconds)
  • 20. Activity 2-D: Calculate an aggregate metric (pt 2) CREATE OR REPLACE STREAM AGGREGATE_STREAM (sensor_id VARCHAR(16), avg_ambient_temp DECIMAL(16,2), avg_measured_temp DECIMAL(16,2), measurements INTEGER); CREATE OR REPLACE PUMP AGGREGATE_PUMP AS INSERT INTO AGGREGATE_STREAM SELECT STREAM sensor_id, AVG(ambient_temp), AVG(measured_temp), COUNT(sensor_id) FROM CLEAN_IOT_STREAM GROUP BY sensor_id, FLOOR((CLEAN_IOT_STREAM.ROWTIME - TIMESTAMP '1970-01-01 00:00:00') SECOND / 10 TO SECOND);
  • 21. Different types of windows Tumbling Sliding Custom
  • 22. Tumbling Windows Tumbling windows are useful for periodic reports. You can use a tumbling window to compute an average number of visitors to your website in the last 5 minutes, or the maximum over the past hour. A single result is emitted for each key in the group as specified by the clause at the end of the defined window. An important characteristic of a tumbling window is that the bounds do not overlap; the start of a new tumbling window begins with the end of the old window. The below image visually captures how a 5-minute tumbling window would aggregate results, in separate windows that are not overlapping. The blue, green, and red windows represent minutes 0 to 5, 5 to 10, and 10 to 15, respectively.
  • 23. Tumbling Windows in Group By Tumbling windows are always included in a GROUP BY clause and use the FLOOR function. The FLOOR function takes the lowest possible integer for a given input. To create a tumbling time window, convert a timestamp to an integer representation and put it in a FLOOR function to create the window bound. Three examples are shown below:
  • 24. Activity 2-E: Stop your application 1. Go to the Amazon Kinesis Analytics main dashboard. 2. Select your application. 3. Click “Actions” and then “Stop Application”.
  • 25. Review – In-Application SQL streams Your application has multiple in-application SQL streams including CLEAN_IOT_STREAM, FILTERED_STREAM, and AGGREGATE_STREAM. These in-application streams which are like SQL tables that are continuously updated. What else is unique about an in-application stream aside from its continuous nature?
  • 27. Activity 3: Deliver Data to S3 using Firehose We are going to: A. Create a Firehose delivery stream B. Update Kinesis Analytics app C. Check the S3 bucket for results D. Clean up
  • 28. Activity 3-A: Create Firehose delivery stream 1. Go to the Amazon Kinesis Firehose dashboard, click “Create Delivery Stream”. 1. https://console.aws.amazon.com/firehose/home?region=us-west-2#/intro 2. Use a different name then the one you used for your Kinesis stream. 2. Select Amazon S3 for your destination, create a new bucket in Oregon, and name it (i.e. <yourname>-filtered-iot-data). Click “Next” 3. Choose Configuration options 1. Choose a buffer size of 5 MB, and 60 seconds 2. Choose No compression or encryption 3. Turn on Error Logging 4. Select “Create new IAM Role” and then “Allow” 5. Click “Next” 4. Review and click “Create Delivery Stream”
  • 29. Activity 3-B: Update Kinesis Analytics app (part 1) 1. Go to the Kinesis Analytics Console, select your application 2. Go to the SQL Editor, click “No, I’ll do this later” 3. Identify the name of the in-application stream you used for the filtered results (i.e. FILTERED_STREAM). 4. Click “Exit (done editing)”. 5. Click “Connect to a destination”.
  • 30. Activity 3-B: Update Kinesis Analytics app (part 2) 5. Configure your destination 1. Choose the Firehose delivery stream you just created 2. Edit the “.. refer to this stream as” to the in- application stream containing the filtered results. 3. Choose CSV as the “Output format” 4. Click “Save and continue” 6. Go back to the Kinesis Analytics dashboard, select your application, and click “Run Application”
  • 31. Activity 3-C: Check the S3 bucket for results 1. Go to the Amazon S3 console 1. https://console.aws.amazon.com/s3/home?region=us-west- 2# 2. Select the bucket you created in Activity 3-C (i.e. filtered- iot-data 3. Navigate down the folder hierarchy 4. Select an object and click “Action”, “Download” 5. Open up the file with a text editor
  • 32. Review – Reliable delivery to a destination You are successfully delivering data to S3. What are possible issues that would prevent Firehose from delivering data? If something went wrong, where would you look?
  • 33. Activity 3-D: Clean up 1. Stop the data producer 2. Go to the Kinesis Analytics console, delete your application 3. Go to the Kinesis Streams console, delete your stream 4. Go to the Kinesis Firehose console, delete your deliver stream 5. Go to the Amazon S3 console, right-click on your bucket and choose “Delete Bucket”
  • 34. Some Next Steps We have many AWS Big Data Blogs which cover more examples. Full list here. Some good ones: 1. Kinesis Streams 1. Implement Efficient and Reliable Producers with the Amazon Kinesis Producer Library 2. Presto and Amazon Kinesis 3. Querying Amazon Kinesis Streams Directly with SQL and Sparking Streaming 4. Optimize Spark-Streaming to Efficiently Process Amazon Kinesis Streams 2. Kinesis Firehose 1. Persist Streaming Data to Amazon S3 using Amazon Kinesis Firehose and AWS Lambda 2. Building a Near Real-Time Discovery Platform with AWS 3. Kinesis Analytics 1. Writing SQL on Streaming Data With Amazon Kinesis Analytics Part 1 | Part 2 2. Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics
  • 35. • Technical documentations • Amazon Kinesis Agent • Amazon Kinesis Streams and Spark Streaming • Amazon Kinesis Producer Library Best Practice • Amazon Kinesis Firehose and AWS Lambda • Building Near Real-Time Discovery Platform with Amazon Kinesis • Public case studies • Glu mobile – Real-Time Analytics • Hearst Publishing – Clickstream Analytics • How Sonos Leverages Amazon Kinesis • Nordstorm Online Stylist Reference