SlideShare a Scribd company logo
1 of 54
Download to read offline
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Part 1 of 3: The basics of real-time streaming analytics
Getting started with streaming analytics
Javier Ramirez
AWS Developer Advocate
@supercoco9
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Agenda
Why real-time analytics and data streaming?
Challenges of streaming analytics
Useful concepts to reason about streaming data
Components of a streaming analytics pipeline
Overview of popular Open Source components for
streaming analytics: Apache Kafka, Apache Spark, Apache Flink, Apache
Cassandra, Apache HBase, ElasticSearch
AWS toolbox for streaming analytics: Amazon MSK, Amazon
EMR, Amazon Kinesis, Amazon Keyspaces, Amazon DynamoDB, Amazon
ElasticSearch
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Why streaming analytics
• The number of “smart” devices is
projected to be 200 billion by 2020
(over 100X increase in ten years)
• 90% of the data in the world was generated in the
last 2 years
• There are 2.5 quintillion bytes of
data created each day, and this
pace is accelerating
Source: BI Intelligence Estimates Source: Forbes – How much data do we produce
Data streaming technology enables a customer to ingest, process,
and analyze high volumes of high-velocity data from a variety of
sources
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
The of data diminishes over time
Source: Perishable insights, Mike Gualtieri, Forrester
Real time Seconds Minutes Hours Days Months
Valueofdatatodecision-making
Preventive/predictive
Actionable Reactive Historical
Time-critical decisions Traditional “batch” business intelligence
Information half-life
in decision-making
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Cannot I just use batch big data analytics tools?
https://aws.amazon.com/streaming-data/
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Cannot I just use batch big data analytics tools?
Data is never complete
You don’t know the volume of the data before you start
Low-latency is expected
Data can come out of order
System should remain available during upgrades
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
A simple problem (until you know the details)
I want to calculate the total and average of several numbers
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
A simple big data problem (until you know the details)
I want to calculate the total and average of several numbers
They might be MANY numbers, more than you can store in memory,
or in a single hard drive
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
A simple streaming problem
I want to calculate the total and average of several numbers
They might be MANY numbers, more than you can store in memory, or in a
single hard drive
The dataset is not static, new numbers are coming all the time
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
A simplish streaming problem
I want to calculate the total and average of several numbers
They might be MANY numbers, more than you can store in memory, or in a
single hard drive
The dataset is not static, new numbers are coming all the time
From different sensors, which are geo distributed and moving. We
will be adding and removing sensors all the time
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
A quite standard streaming problem
I want to calculate the total and average of several numbers
They might be MANY numbers, more than you can store in memory, or in a
single hard drive
The dataset is not static, new numbers are coming all the time
From different sensors, which are geo distributed and moving. We will be
adding and removing sensors all the time
And since they use 3G and batteries, some might go quiet for a
while and then send a bunch of stale data
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
An elastic and scalable streaming problem
I want to calculate the total and average of several numbers
They might be MANY numbers, more than you can store in memory, or in a
single hard drive
The dataset is not static, new numbers are coming all the time
From different sensors, which are geo distributed and moving. We will be
adding and removing sensors all the time
And since they use 3G and batteries, some might go quiet for a while and then
send a bunch of stale data
Flow will not be constant (from few events per second to
thousands)
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
An almost real-life streaming analytics scenario
I want to calculate the total and average of several numbers
They might be MANY numbers, more than you can store in memory, or in a
single hard drive
The dataset is not static, new numbers are coming all the time
From different sensors, which are geo distributed and moving. We will be
adding and removing sensors all the time
And since they use 3G and batteries, some might go quiet for a while and then
send a bunch of stale data
Flow will not be constant (from few events per second to thousands)
And I don’t want just the total average, but total per month, per
week, per day, per hour, per minute…
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
A real business use case for streaming
I want to calculate the total and average of several numbers
They might be MANY numbers, more than you can store in memory, or in a single hard drive
The dataset is not static, new numbers are coming all the time
From different sensors, which are geo distributed and moving. We will be adding and removing sensors all the time
And since they use 3G and batteries, some might go quiet for a while and then send a bunch of stale data
Flow will not be constant (from few events per second to thousands)
And I don’t want just the total average, but total per month, per week, per day, per hour, per minute…
We need pretty dashboards with current status, comparison with the
past, trends, and anomaly detection
To run this reliably, we need advanced monitoring, alerts, and
autoscaling
No, I am not hiring a whole new operations team to manage the
system
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
http://gunshowcomic.com/648
Probably less than you think
~20 lines of JAVA code (plus a
few hundreds with imports,
POJOs, and boilerplate, because
JAVA)
a simple GROUP BY statement in
SQL with streaming extensions
(plus a few lines of boilerplate for
schema definition)
OR
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Streaming analytics concepts
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Streaming data pipeline overview
Ingest Transform Analyze React Persist
• Durable
• Stateful
• Continuous
• Fast
• Correct
• Reactive
• Reliable
What are the key requirements?
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Durability and reliability
Need to store intermediate data
You might want to be able to replay the stream
Self-healing architecture. If one component goes down
while data is in-flight, the system needs to re-balance and
data needs to be reassigned seamlessly
Monitoring
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stateful processing
Working on per-element streams is relatively easy (i.e. change format of each item, or filter
our records based on their own properties)
13:00 14:008:00 9:00 10:00 11:00 12:00 Processing Time
Graphics from The Beam Model. By Tyler Akidau and Frances Perry. https://beam.apache.org/community/presentation-materials/
The real fun starts when you need to do transforms/ aggregations over groups of elements:
group by, count, max, average, joins, filtering based on properties from related records, or
complex pattern detection
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Continuous and fast
Data can come in spikes, faster than we can process it.
Need to account for reliable persistent storage while in-
flight
You will need to think how to update a system that never
stops receiving data
Since data is never complete, in the case of stateful
computations, we need to decide when to output data
(windowing)
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Processing-Time based windows
13:00 14:008:00 9:00 10:00 11:00 12:00
Processing
Time
Graphics from The Beam Model. By Tyler Akidau and Frances Perry. https://beam.apache.org/community/presentation-materials/
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Event-Time Based Windows
Event Time
Processing
Time 11:0010:00 15:0014:0013:0012:00
11:0010:00 15:0014:0013:0012:00
Input
Output
Graphics from The Beam Model. By Tyler Akidau and Frances Perry. https://beam.apache.org/community/presentation-materials/
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Session Windows
Event Time
Processing
Time 11:0010:00 15:0014:0013:0012:00
11:0010:00 15:0014:0013:0012:00
Input
Output
Graphics from The Beam Model. By Tyler Akidau and Frances Perry. https://beam.apache.org/community/presentation-materials/
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Correctness: Late-arriving data
Event-time vs Processing-time
Graphics from The Beam Model. By Tyler Akidau and Frances Perry. https://beam.apache.org/community/presentation-materials/
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Correctness: Delivery semantics
• Exactly once
• At least once
• At most once
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Reactive
All the components need to be designed for low-latency
Source: Perishable insights, Mike Gualtieri, Forrester
Real time Seconds Minutes Hours Days Months
Valueofdatatodecision-making
Preventive/predictive
Actionable Reactive Historical
Time-critical decisions Traditional “batch” business intelligence
Information half-life
in decision-making
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Components of a streaming
analytics pipeline
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Streaming analytics components
Devices and/or
applications
that produce
real-time
data at high
velocity
Data from tens of
thousands of data sources
can be written to a single
stream
Data are stored in the
order they were received
for a set duration
of time and can be
replayed indefinitely
during that time
Records are read in
the order they are produced,
enabling real-time analytics
or streaming ETL
Database (NoSQL
most common),
Message broker,
Notification system,
File Storage, or Data
Lake
`
Analytics
dashboard
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
The (excellent) Open Source ecosystem
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Ingestion/in-stream storage: Apache Kafka
A distributed streaming platform
Concepts:
Producers
Topics
Brokers
Consumers
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Ingestion/in-stream storage: Apache Flume
Distributed, reliable, and available service for collecting,
aggregating, and moving large amounts of log data
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream Processing: Apache Spark
Unified Analytics Engine for large-scale data processing
Concepts:
Driver/Workers
Data Source
Discretized Stream
Transforms
Streaming SQL
Outputs
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream Processing: Apache Spark
Unified Analytics Engine for large-scale data processing
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream Processing: Apache Flink
Stateful computation over Data Streams
Concepts:
Job Manager/Workers
Source
DataStream
Transforms/Operators
TableAPI/SQL
Sinks
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream Processing: Apache Flink
Stateful computation over Data Streams
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream Processing: Apache Flink
Stateful computation over Data Streams
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream Storage: Apache Cassandra
Manage massive amounts of data, fast, without losing sleep
https://cassandra.apache.org/
Concepts:
Nodes
Token Ring
Consistency Levels
Column Families
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream Storage: Apache Cassandra
Manage massive amounts of data, fast, without losing sleep
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream Storage: Apache HBase
The Hadoop database, a distributed, scalable, big data store
https://hbase.apache.org/book.html
First, make sure you have enough data. If you have
hundreds of millions or billions of rows, then HBase
is a good candidate. If you only have a few
thousand/million rows, then using a traditional
RDBMS might be a better choice due to the fact
that all of your data might wind up on a single node
(or two) and the rest of the cluster may be sitting
idle.
Concepts: Hbase Master, Regions, Region Servers, Data Nodes, Column Families
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Dashboard: Elasticsearch with Kibana
Elasticsearch is a distributed JSON-based search and
analytics engine. Kibana gives shape to your data
https://www.elastic.co/kibana
Wikimedia has a live
interactive dashboard
powered by Kibana at
https://wikimedia.biterg.io/
Concepts:
Master Node
Data Nodes
Shard
Index
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Dashboard: Grafana
Grafana allows you to query, visualize, alert on and
understand your metrics no matter where they are stored.
https://grafana.com/grafana/
Wikimedia also has a
live interactive metrics
dashboard powered by
Grafana at
https://grafana.wikimedia.org/
Concepts:
Data Source
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Challenges of data streaming components
Difficult to setup Tricky to scale
Hard to achieve high availability Integration required
development
Error prone and complex to manage Expensive to maintain
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
AWS services for streaming analytics
Both managed services and native services
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Streaming real-time data with AWS
* Some services scale up and down elastically, while others allow you to automate when to scale up/down
** It is possible to have a serverless data streaming pipeline, in which you pay only for what you use. In the case of managed
non-serverless services, you can dynamically adapt to your traffic
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Services for Ingestion/in-stream storage
Amazon Managed Streaming for Apache Kafka
Fully managed version of Apache Kafka
Amazon Kinesis Data Streams
Massively scalable, elastic, and durable real-time data streaming
Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data
into data lakes, data stores, and analytics services.
AWS Glue with serverless streaming
Simple, flexible, and cost-effective ETL
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Services for stream processing
Amazon Kinesis Data Analytics for Apache Flink
Fully managed, elastic, version of Apache Flink
Amazon Kinesis Data Analytics for SQL Applications
Process and analyze streaming data using standard SQL
Amazon EMR
Easily run and scale Apache Spark and other big data frameworks. You can also
run Apache Flink and Apache HBase on EMR
AWS Glue with serverless streaming
Simple, flexible, and cost-effective ETL. Supports Spark for serverless ETL
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Services for stream storage
Amazon Keyspaces for Apache Cassandra
Scalable, highly available, and managed Apache Cassandra compatible db service
Amazon DynamoDB
Fast and flexible NoSQL database service for any scale (for example, in 2017 Samsung
Cloud Service was serving 300M users with a total storage of 860TB)
Amazon EMR
Easily run and scale Apache HBase and other big data frameworks. You can also run
Apache Flink and Apache Spark on EMR
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Services for analytics dashboards
Amazon Elasticsearch Service
Fully managed, scalable, and secure Elasticsearch service
Amazon Quicksight
Fast, cloud-powered business intelligence service that makes it easy to deliver
insights to everyone in your organization.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
A serverless data stream (per element processing)
data
producer
Kinesis Data
Streams
Amazon
SNS
Continuously stream data
Lambda
service
Lambda
functionA
Lambda
function B
Continuously polls for new data,
1 poll per second
Automatically invokes your
function(s) when data found
DynamoDB
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Fully managed stateful streaming analytics
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Getting Started
https://engineering.linkedin.com/distributed-systems/log-what-every-software-
engineer-should-know-about-real-time-datas-unifying
A great write-up on streaming analytics challenges
https://aws.amazon.com/streaming-data/
Streaming data
https://docs.aws.amazon.com/msk/latest/developerguide/what-is-msk.html
Getting started with Apache Kafka/Amazon MSK
https://aws.amazon.com/kinesis/
Amazon Kinesis Services for streaming data
https://aws.amazon.com/elasticsearch-service/
Amazon ElasticSearch Service
https://dl.acm.org/doi/10.1145/543613.543615
Research about Models and Issues in data stream systems
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
ThanksJavier Ramirez
AWS Developer Advocate
@supercoco9

More Related Content

What's hot

What’s New in AWS Database Services
What’s New in AWS Database ServicesWhat’s New in AWS Database Services
What’s New in AWS Database ServicesAmazon Web Services
 
The Future of Enterprise Applications is Serverless (ENT314-R1) - AWS re:Inve...
The Future of Enterprise Applications is Serverless (ENT314-R1) - AWS re:Inve...The Future of Enterprise Applications is Serverless (ENT314-R1) - AWS re:Inve...
The Future of Enterprise Applications is Serverless (ENT314-R1) - AWS re:Inve...Amazon Web Services
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Amazon Web Services
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Amazon Web Services LATAM
 
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Web Services
 
AWS Webcast - What is Cloud Computing?
AWS Webcast - What is Cloud Computing?AWS Webcast - What is Cloud Computing?
AWS Webcast - What is Cloud Computing?Amazon Web Services
 
New AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your WorkloadNew AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your WorkloadAmazon Web Services
 
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWS
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWSTransformation Track AWS Cloud Experience Argentina - Bases de Datos en AWS
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWSAmazon Web Services LATAM
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Web Services
 
Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsAmazon Web Services
 
Intro Presentation at AWS AWSome Day Dublin July 2015
Intro Presentation at AWS AWSome Day Dublin July 2015Intro Presentation at AWS AWSome Day Dublin July 2015
Intro Presentation at AWS AWSome Day Dublin July 2015Ian Massingham
 
AWS DeepLens Workshop_Build Computer Vision Applications
AWS DeepLens Workshop_Build Computer Vision Applications AWS DeepLens Workshop_Build Computer Vision Applications
AWS DeepLens Workshop_Build Computer Vision Applications Amazon Web Services
 
AWSome Day - Solutions Architecture Best Practices
AWSome Day - Solutions Architecture Best PracticesAWSome Day - Solutions Architecture Best Practices
AWSome Day - Solutions Architecture Best PracticesAmazon Web Services
 
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...Amazon Web Services
 
AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive Amazon Web Services
 
Build Enterprise-Grade Serverless Apps
Build Enterprise-Grade Serverless Apps Build Enterprise-Grade Serverless Apps
Build Enterprise-Grade Serverless Apps Amazon Web Services
 

What's hot (20)

What’s New in AWS Database Services
What’s New in AWS Database ServicesWhat’s New in AWS Database Services
What’s New in AWS Database Services
 
Amazon Aurora_Deep Dive
Amazon Aurora_Deep DiveAmazon Aurora_Deep Dive
Amazon Aurora_Deep Dive
 
The Future of Enterprise Applications is Serverless (ENT314-R1) - AWS re:Inve...
The Future of Enterprise Applications is Serverless (ENT314-R1) - AWS re:Inve...The Future of Enterprise Applications is Serverless (ENT314-R1) - AWS re:Inve...
The Future of Enterprise Applications is Serverless (ENT314-R1) - AWS re:Inve...
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
 
SRV321 Deep Dive on Amazon EBS
SRV321 Deep Dive on Amazon EBSSRV321 Deep Dive on Amazon EBS
SRV321 Deep Dive on Amazon EBS
 
Modern Data Platform on AWS
Modern Data Platform on AWSModern Data Platform on AWS
Modern Data Platform on AWS
 
AWS Webcast - What is Cloud Computing?
AWS Webcast - What is Cloud Computing?AWS Webcast - What is Cloud Computing?
AWS Webcast - What is Cloud Computing?
 
New AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your WorkloadNew AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your Workload
 
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWS
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWSTransformation Track AWS Cloud Experience Argentina - Bases de Datos en AWS
Transformation Track AWS Cloud Experience Argentina - Bases de Datos en AWS
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
 
Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your Applications
 
Intro Presentation at AWS AWSome Day Dublin July 2015
Intro Presentation at AWS AWSome Day Dublin July 2015Intro Presentation at AWS AWSome Day Dublin July 2015
Intro Presentation at AWS AWSome Day Dublin July 2015
 
AWS DeepLens Workshop_Build Computer Vision Applications
AWS DeepLens Workshop_Build Computer Vision Applications AWS DeepLens Workshop_Build Computer Vision Applications
AWS DeepLens Workshop_Build Computer Vision Applications
 
AWSome Day - Solutions Architecture Best Practices
AWSome Day - Solutions Architecture Best PracticesAWSome Day - Solutions Architecture Best Practices
AWSome Day - Solutions Architecture Best Practices
 
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
 
Managed NoSQL databases
Managed NoSQL databasesManaged NoSQL databases
Managed NoSQL databases
 
AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive
 
Build Enterprise-Grade Serverless Apps
Build Enterprise-Grade Serverless Apps Build Enterprise-Grade Serverless Apps
Build Enterprise-Grade Serverless Apps
 

Similar to Getting started with streaming analytics: streaming basics (1 of 3)

Analysing streaming data in real time (AWS)
Analysing streaming data in real time (AWS)Analysing streaming data in real time (AWS)
Analysing streaming data in real time (AWS)javier ramirez
 
透過資料平台掌握關鍵數據消費者洞察極大化
透過資料平台掌握關鍵數據消費者洞察極大化透過資料平台掌握關鍵數據消費者洞察極大化
透過資料平台掌握關鍵數據消費者洞察極大化Amazon Web Services
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Amazon Web Services
 
State of the Union: Compute & DevOps
State of the Union: Compute & DevOpsState of the Union: Compute & DevOps
State of the Union: Compute & DevOpsAmazon Web Services
 
Data Led Migration
Data Led Migration Data Led Migration
Data Led Migration Sandy Carter
 
AI/ML Week: Strengthen Cybersecurity
AI/ML Week: Strengthen CybersecurityAI/ML Week: Strengthen Cybersecurity
AI/ML Week: Strengthen CybersecurityAmazon Web Services
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Amazon Web Services
 
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Amazon Web Services
 
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...Amazon Web Services
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Summits
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019Amazon Web Services
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Amazon Web Services
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Amazon Web Services
 
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureAmazon Web Services
 
Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudAlluxio, Inc.
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 

Similar to Getting started with streaming analytics: streaming basics (1 of 3) (20)

Analysing streaming data in real time (AWS)
Analysing streaming data in real time (AWS)Analysing streaming data in real time (AWS)
Analysing streaming data in real time (AWS)
 
透過資料平台掌握關鍵數據消費者洞察極大化
透過資料平台掌握關鍵數據消費者洞察極大化透過資料平台掌握關鍵數據消費者洞察極大化
透過資料平台掌握關鍵數據消費者洞察極大化
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
State of the Union: Compute & DevOps
State of the Union: Compute & DevOpsState of the Union: Compute & DevOps
State of the Union: Compute & DevOps
 
Data Led Migration
Data Led Migration Data Led Migration
Data Led Migration
 
AI/ML Week: Strengthen Cybersecurity
AI/ML Week: Strengthen CybersecurityAI/ML Week: Strengthen Cybersecurity
AI/ML Week: Strengthen Cybersecurity
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
 
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
 
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
 
Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the Cloud
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 

More from javier ramirez

¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfestjavier ramirez
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databasejavier ramirez
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBjavier ramirez
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)javier ramirez
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Databasejavier ramirez
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728javier ramirez
 
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022javier ramirez
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...javier ramirez
 
Servicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en AragónServicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en Aragónjavier ramirez
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessjavier ramirez
 
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloudjavier ramirez
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMjavier ramirez
 
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analyticsjavier ramirez
 
Monitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSMonitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSjavier ramirez
 
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...javier ramirez
 
Recomendaciones, predicciones y detección de fraude usando servicios de intel...
Recomendaciones, predicciones y detección de fraude usando servicios de intel...Recomendaciones, predicciones y detección de fraude usando servicios de intel...
Recomendaciones, predicciones y detección de fraude usando servicios de intel...javier ramirez
 
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...javier ramirez
 

More from javier ramirez (20)

¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDB
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728
 
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
 
Servicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en AragónServicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en Aragón
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverless
 
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloud
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
 
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analytics
 
Monitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSMonitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWS
 
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
 
Recomendaciones, predicciones y detección de fraude usando servicios de intel...
Recomendaciones, predicciones y detección de fraude usando servicios de intel...Recomendaciones, predicciones y detección de fraude usando servicios de intel...
Recomendaciones, predicciones y detección de fraude usando servicios de intel...
 
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
 

Recently uploaded

Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 

Recently uploaded (20)

Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 

Getting started with streaming analytics: streaming basics (1 of 3)

  • 1. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Part 1 of 3: The basics of real-time streaming analytics Getting started with streaming analytics Javier Ramirez AWS Developer Advocate @supercoco9
  • 2. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Agenda Why real-time analytics and data streaming? Challenges of streaming analytics Useful concepts to reason about streaming data Components of a streaming analytics pipeline Overview of popular Open Source components for streaming analytics: Apache Kafka, Apache Spark, Apache Flink, Apache Cassandra, Apache HBase, ElasticSearch AWS toolbox for streaming analytics: Amazon MSK, Amazon EMR, Amazon Kinesis, Amazon Keyspaces, Amazon DynamoDB, Amazon ElasticSearch
  • 3. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Why streaming analytics • The number of “smart” devices is projected to be 200 billion by 2020 (over 100X increase in ten years) • 90% of the data in the world was generated in the last 2 years • There are 2.5 quintillion bytes of data created each day, and this pace is accelerating Source: BI Intelligence Estimates Source: Forbes – How much data do we produce Data streaming technology enables a customer to ingest, process, and analyze high volumes of high-velocity data from a variety of sources
  • 4. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential The of data diminishes over time Source: Perishable insights, Mike Gualtieri, Forrester Real time Seconds Minutes Hours Days Months Valueofdatatodecision-making Preventive/predictive Actionable Reactive Historical Time-critical decisions Traditional “batch” business intelligence Information half-life in decision-making
  • 5. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Cannot I just use batch big data analytics tools? https://aws.amazon.com/streaming-data/
  • 6. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Cannot I just use batch big data analytics tools? Data is never complete You don’t know the volume of the data before you start Low-latency is expected Data can come out of order System should remain available during upgrades
  • 7. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential A simple problem (until you know the details) I want to calculate the total and average of several numbers
  • 8. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential A simple big data problem (until you know the details) I want to calculate the total and average of several numbers They might be MANY numbers, more than you can store in memory, or in a single hard drive
  • 9. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential A simple streaming problem I want to calculate the total and average of several numbers They might be MANY numbers, more than you can store in memory, or in a single hard drive The dataset is not static, new numbers are coming all the time
  • 10. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential A simplish streaming problem I want to calculate the total and average of several numbers They might be MANY numbers, more than you can store in memory, or in a single hard drive The dataset is not static, new numbers are coming all the time From different sensors, which are geo distributed and moving. We will be adding and removing sensors all the time
  • 11. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential A quite standard streaming problem I want to calculate the total and average of several numbers They might be MANY numbers, more than you can store in memory, or in a single hard drive The dataset is not static, new numbers are coming all the time From different sensors, which are geo distributed and moving. We will be adding and removing sensors all the time And since they use 3G and batteries, some might go quiet for a while and then send a bunch of stale data
  • 12. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential An elastic and scalable streaming problem I want to calculate the total and average of several numbers They might be MANY numbers, more than you can store in memory, or in a single hard drive The dataset is not static, new numbers are coming all the time From different sensors, which are geo distributed and moving. We will be adding and removing sensors all the time And since they use 3G and batteries, some might go quiet for a while and then send a bunch of stale data Flow will not be constant (from few events per second to thousands)
  • 13. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential An almost real-life streaming analytics scenario I want to calculate the total and average of several numbers They might be MANY numbers, more than you can store in memory, or in a single hard drive The dataset is not static, new numbers are coming all the time From different sensors, which are geo distributed and moving. We will be adding and removing sensors all the time And since they use 3G and batteries, some might go quiet for a while and then send a bunch of stale data Flow will not be constant (from few events per second to thousands) And I don’t want just the total average, but total per month, per week, per day, per hour, per minute…
  • 14. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential A real business use case for streaming I want to calculate the total and average of several numbers They might be MANY numbers, more than you can store in memory, or in a single hard drive The dataset is not static, new numbers are coming all the time From different sensors, which are geo distributed and moving. We will be adding and removing sensors all the time And since they use 3G and batteries, some might go quiet for a while and then send a bunch of stale data Flow will not be constant (from few events per second to thousands) And I don’t want just the total average, but total per month, per week, per day, per hour, per minute… We need pretty dashboards with current status, comparison with the past, trends, and anomaly detection To run this reliably, we need advanced monitoring, alerts, and autoscaling No, I am not hiring a whole new operations team to manage the system
  • 15. © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 17. Probably less than you think ~20 lines of JAVA code (plus a few hundreds with imports, POJOs, and boilerplate, because JAVA) a simple GROUP BY statement in SQL with streaming extensions (plus a few lines of boilerplate for schema definition) OR
  • 18. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Streaming analytics concepts
  • 19. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Streaming data pipeline overview Ingest Transform Analyze React Persist • Durable • Stateful • Continuous • Fast • Correct • Reactive • Reliable What are the key requirements?
  • 20. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Durability and reliability Need to store intermediate data You might want to be able to replay the stream Self-healing architecture. If one component goes down while data is in-flight, the system needs to re-balance and data needs to be reassigned seamlessly Monitoring
  • 21. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Stateful processing Working on per-element streams is relatively easy (i.e. change format of each item, or filter our records based on their own properties) 13:00 14:008:00 9:00 10:00 11:00 12:00 Processing Time Graphics from The Beam Model. By Tyler Akidau and Frances Perry. https://beam.apache.org/community/presentation-materials/ The real fun starts when you need to do transforms/ aggregations over groups of elements: group by, count, max, average, joins, filtering based on properties from related records, or complex pattern detection
  • 22. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Continuous and fast Data can come in spikes, faster than we can process it. Need to account for reliable persistent storage while in- flight You will need to think how to update a system that never stops receiving data Since data is never complete, in the case of stateful computations, we need to decide when to output data (windowing)
  • 23. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Processing-Time based windows 13:00 14:008:00 9:00 10:00 11:00 12:00 Processing Time Graphics from The Beam Model. By Tyler Akidau and Frances Perry. https://beam.apache.org/community/presentation-materials/
  • 24. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Event-Time Based Windows Event Time Processing Time 11:0010:00 15:0014:0013:0012:00 11:0010:00 15:0014:0013:0012:00 Input Output Graphics from The Beam Model. By Tyler Akidau and Frances Perry. https://beam.apache.org/community/presentation-materials/
  • 25. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Session Windows Event Time Processing Time 11:0010:00 15:0014:0013:0012:00 11:0010:00 15:0014:0013:0012:00 Input Output Graphics from The Beam Model. By Tyler Akidau and Frances Perry. https://beam.apache.org/community/presentation-materials/
  • 26. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Correctness: Late-arriving data Event-time vs Processing-time Graphics from The Beam Model. By Tyler Akidau and Frances Perry. https://beam.apache.org/community/presentation-materials/
  • 27. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Correctness: Delivery semantics • Exactly once • At least once • At most once
  • 28. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Reactive All the components need to be designed for low-latency Source: Perishable insights, Mike Gualtieri, Forrester Real time Seconds Minutes Hours Days Months Valueofdatatodecision-making Preventive/predictive Actionable Reactive Historical Time-critical decisions Traditional “batch” business intelligence Information half-life in decision-making
  • 29. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Components of a streaming analytics pipeline
  • 30. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Streaming analytics components Devices and/or applications that produce real-time data at high velocity Data from tens of thousands of data sources can be written to a single stream Data are stored in the order they were received for a set duration of time and can be replayed indefinitely during that time Records are read in the order they are produced, enabling real-time analytics or streaming ETL Database (NoSQL most common), Message broker, Notification system, File Storage, or Data Lake ` Analytics dashboard
  • 31. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark The (excellent) Open Source ecosystem
  • 32. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Ingestion/in-stream storage: Apache Kafka A distributed streaming platform Concepts: Producers Topics Brokers Consumers
  • 33. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Ingestion/in-stream storage: Apache Flume Distributed, reliable, and available service for collecting, aggregating, and moving large amounts of log data
  • 34. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Stream Processing: Apache Spark Unified Analytics Engine for large-scale data processing Concepts: Driver/Workers Data Source Discretized Stream Transforms Streaming SQL Outputs
  • 35. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Stream Processing: Apache Spark Unified Analytics Engine for large-scale data processing
  • 36. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Stream Processing: Apache Flink Stateful computation over Data Streams Concepts: Job Manager/Workers Source DataStream Transforms/Operators TableAPI/SQL Sinks
  • 37. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Stream Processing: Apache Flink Stateful computation over Data Streams
  • 38. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Stream Processing: Apache Flink Stateful computation over Data Streams
  • 39. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Stream Storage: Apache Cassandra Manage massive amounts of data, fast, without losing sleep https://cassandra.apache.org/ Concepts: Nodes Token Ring Consistency Levels Column Families
  • 40. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Stream Storage: Apache Cassandra Manage massive amounts of data, fast, without losing sleep
  • 41. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Stream Storage: Apache HBase The Hadoop database, a distributed, scalable, big data store https://hbase.apache.org/book.html First, make sure you have enough data. If you have hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few thousand/million rows, then using a traditional RDBMS might be a better choice due to the fact that all of your data might wind up on a single node (or two) and the rest of the cluster may be sitting idle. Concepts: Hbase Master, Regions, Region Servers, Data Nodes, Column Families
  • 42. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Dashboard: Elasticsearch with Kibana Elasticsearch is a distributed JSON-based search and analytics engine. Kibana gives shape to your data https://www.elastic.co/kibana Wikimedia has a live interactive dashboard powered by Kibana at https://wikimedia.biterg.io/ Concepts: Master Node Data Nodes Shard Index
  • 43. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Dashboard: Grafana Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. https://grafana.com/grafana/ Wikimedia also has a live interactive metrics dashboard powered by Grafana at https://grafana.wikimedia.org/ Concepts: Data Source
  • 44. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Challenges of data streaming components Difficult to setup Tricky to scale Hard to achieve high availability Integration required development Error prone and complex to manage Expensive to maintain
  • 45. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark AWS services for streaming analytics Both managed services and native services
  • 46. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Streaming real-time data with AWS * Some services scale up and down elastically, while others allow you to automate when to scale up/down ** It is possible to have a serverless data streaming pipeline, in which you pay only for what you use. In the case of managed non-serverless services, you can dynamically adapt to your traffic
  • 47. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Services for Ingestion/in-stream storage Amazon Managed Streaming for Apache Kafka Fully managed version of Apache Kafka Amazon Kinesis Data Streams Massively scalable, elastic, and durable real-time data streaming Amazon Kinesis Data Firehose Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services. AWS Glue with serverless streaming Simple, flexible, and cost-effective ETL
  • 48. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Services for stream processing Amazon Kinesis Data Analytics for Apache Flink Fully managed, elastic, version of Apache Flink Amazon Kinesis Data Analytics for SQL Applications Process and analyze streaming data using standard SQL Amazon EMR Easily run and scale Apache Spark and other big data frameworks. You can also run Apache Flink and Apache HBase on EMR AWS Glue with serverless streaming Simple, flexible, and cost-effective ETL. Supports Spark for serverless ETL
  • 49. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Services for stream storage Amazon Keyspaces for Apache Cassandra Scalable, highly available, and managed Apache Cassandra compatible db service Amazon DynamoDB Fast and flexible NoSQL database service for any scale (for example, in 2017 Samsung Cloud Service was serving 300M users with a total storage of 860TB) Amazon EMR Easily run and scale Apache HBase and other big data frameworks. You can also run Apache Flink and Apache Spark on EMR
  • 50. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Services for analytics dashboards Amazon Elasticsearch Service Fully managed, scalable, and secure Elasticsearch service Amazon Quicksight Fast, cloud-powered business intelligence service that makes it easy to deliver insights to everyone in your organization.
  • 51. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential A serverless data stream (per element processing) data producer Kinesis Data Streams Amazon SNS Continuously stream data Lambda service Lambda functionA Lambda function B Continuously polls for new data, 1 poll per second Automatically invokes your function(s) when data found DynamoDB
  • 52. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Fully managed stateful streaming analytics
  • 53. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential Getting Started https://engineering.linkedin.com/distributed-systems/log-what-every-software- engineer-should-know-about-real-time-datas-unifying A great write-up on streaming analytics challenges https://aws.amazon.com/streaming-data/ Streaming data https://docs.aws.amazon.com/msk/latest/developerguide/what-is-msk.html Getting started with Apache Kafka/Amazon MSK https://aws.amazon.com/kinesis/ Amazon Kinesis Services for streaming data https://aws.amazon.com/elasticsearch-service/ Amazon ElasticSearch Service https://dl.acm.org/doi/10.1145/543613.543615 Research about Models and Issues in data stream systems
  • 54. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential ThanksJavier Ramirez AWS Developer Advocate @supercoco9