SlideShare une entreprise Scribd logo
1  sur  67
Télécharger pour lire hors ligne
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
AWS Analytics Services - When to use what?
With SimilarWeb
Roy Hasson
Business Development Lead – Analytics and Data Lakes
Amazon Web Services
D A T 2 0 1
Ido Senesh
Sr. Software Engineer
SimilarWeb
S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Data
every 5 years
There is more data
than people think
15
years
live for
Data platforms need to
1,000x
scale
>10x
grows
Modern Data Challenges
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
There are more
people accessing data
And more
requirements for
making data available
Data Scientists
Analysts
Business Users
Applications
Secure Real time
Flexible Scalable
Modern Data Challenges
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Democratization
of data
Governance
& control
There are more
people working
with data than
ever before
How do I provide democratized
access to data to enable
informed decisions while at the
same time enforce data
governance and prevent
mismanagement of the data?
Modern Data Challenges
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
AWS databases and analytics
Broad and deep portfolio, built for builders
AWS Marketplace
Amazon Redshift
Data warehousing
Amazon EMR
Hadoop + Spark
Athena
Interactive analytics
Kinesis Analytics
Real-time
Amazon Elasticsearch service
Operational Analytics
RDS
MySQL, PostgreSQL, MariaDB,
Oracle, SQL Server
Aurora
MySQL, PostgreSQL
Amazon
QuickSight
Amazon
SageMaker
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
S3/Amazon Glacier
AWS Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Data Pipeline | Direct Connect
Data Movement
AnalyticsDatabases
Business Intelligence & Machine Learning
Data Lake
Managed
Blockchain
Blockchain
Templates
Blockchain
Amazon
Comprehend
Amazon
Rekognition
Amazon
Lex
Amazon
Transcribe
AWS DeepLens 250+ solutions
730+ Database
solutions
600+ Analytics
solutions
25+ Blockchain
solutions
20+ Data lake
solutions
30+ solutions
RDS on VMWare
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
A data lake is a centralized repository that allows
you to store all your structured and unstructured
data at any scale
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
More data lakes & analytics on AWS than anywhere else
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Typical steps of building a data lake
Setup Storage1
Move data2
Cleanse, prep, and
catalog data
3
Configure and enforce
security and compliance
policies
4
Make data available
for analytics
5
S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Data Streams
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Managed Streaming for Kafka
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Data Firehose
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Devices
Web
Sensors
Social
EDW
S3://bucket/year=yyyy/month=mm/file.parquet
S3://bucket/year=yyyy/month=mm/file.orc
Real-time data analysis
with Amazon Kinesis
Data Analytics
Ingest streaming
events in real time
with Amazon Kinesis
Output streaming data
to select destinations.
Optimize file format
Take action
Ingestion: Streaming Events
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Database Migration Service
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue ETL
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Ingestion: Database and Data Warehouse
Devices
Web
Sensors
Social
EDW
S3://bucket/table/LOAD001.csv
S3://bucket/table/20181127-1134010000.csv
S3://bucket/year=yyyy/month=mm/file.parquet
S3://bucket/year=yyyy/month=mm/file.orc
S3://bucket/year=yyyy/month=mm/file.parquet
S3://bucket/year=yyyy/month=mm/file.orc
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Ingestion: How to choose
Event
Stream
Batch
Operation
Database
Source
CDC
Persist
Data
Real-time
Analytics
Open
Source
Seamless
Scaling
Y
N
Y
NNY
N
Y YSnapshot
Incremental
N
Amazon
DMS
Y
Y
N
Amazon
MSK
Y
S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Unified metadata repository across relational
databases, Amazon RDS, Amazon Redshift, and
Amazon S3.
Single searchable view into your data, no matter
where it is stored
Ability to automatically crawl and classify your data
Augment technical metadata with business metadata
for tables
Manage access to data using Fine Grain Access
Controls. Even finer with AWS Lake Formation
Apache Hive metastore compatible and integrated
with AWS Analytics services
AWS Glue Data Catalog
Search and explore available data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Crawlers automatically build your Data
Catalog and keep it in sync.
Automatically discover new data, extracts
schema definitions
Detect schema changes and version tables
Detect Hive style partitions on Amazon S3
Built-in classifiers for popular types; custom
classifiers using Grok expression
Run ad hoc or on a schedule; serverless – only
pay when crawler runs
AWS Glue Crawlers
Crawlers
Automatically catalog your data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
AWS Lake Formation (join the preview)
Build, secure, and manage a data lake in days
Build a data lake in days,
not months
Build and deploy a fully
managed data lake with a few
clicks
Enforce security policies
across multiple services
Centrally define security,
governance, and auditing policies in
one place and enforce those policies
for all users and all applications
Combine different
analytics approaches
Empower analyst and data scientist
productivity, giving them self-
service discovery and safe access to
all data from a single catalog
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Hive Metastore Client for AWS Glue Data Catalog
• Connect Hive-Metastore compatible platforms to AWS Glue Data Catalog
• Apache Hive 2.x compatible
• Apache 2.0 license
https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Data Catalog
AWS Glue
Data Catalog
Crawl data sources,
catalog schema &
partitions
Connect Hive
compatible sources
via open connector
Search and
discover data in
your data lake
Integrated AWS
Analytics tools
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Catalog: How to choose
Auto-
discovery
AWS
Analytics
Integration
Y YY Governance
FGAC
AWS
Lake
Formation
DC
Managed
Hadoop
Hive
Metastore
RDS
N
N
GDC
Open
Connector
Y
S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue ETL
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EMR
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Data Transformation
Structured and
unstructured data
available in raw
S3 bucket
Other real-
time streaming
sources
Sometimes
ELT is a better
option
Transformed
S3 bucket for
querying
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Process: How to choose
Consistent
schema/big
data
Cluster
customization
Y YServerless
N
SQL based
transforms
Y
N
Transactional
Y
Variable
schema/sm
all data
Y
Y
<15min
job
Y
N
N
Apache Spark
Python Shell
S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena
Permissions
Data Lake
AWS Cloud
AWS Cloud
Reporting
&
Analytics
Machine
Learning
AWS Cloud
Custom
Applications
AWS Glue
Data Catalog
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Amazon EMR Notebooks in the Console
A managed analytics environment based on Jupyter Notebooks
Amazon EMR clusters
AWS Management
Console for EMR
EMR-managed notebook based
on Jupyter notebook
users
Auto saves notebook file to your S3 bucket
Run queries on your remote EMR cluster
EMR VPC
Customer VPC
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Elasticsearch Service
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon QuickSight
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Serve and Consume
Permissions
Data Lake
AWS Glue
Data Catalog
AWS Cloud
AWS Cloud
Reporting
&
Analytics
Machine
Learning
AWS Cloud
Custom
Applications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Serve: How to choose
Interactive
query
Free-form
search
Y Ymili-sec
response
N
Serverless
N
Interactive
code
Y
Y
N
Y
Repeated
queries
Y
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Build with flexibility in mind
Open Source Secure IntegratedManaged
&
Elastic
S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A queryable interface for the entire goddamn internet
SimilarWeb’s Lead Generator
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
SimilarWeb gives you digital market intelligence
for every website and mobile app worldwide
to understand, track and grow your market
share.
SimilarWeb’s Mission
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
500M
Websites
Some Numbers
100+
Dimensions
50+
Countries
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
The Usecase
Show me all websites with 50M+ Monthly visits
from United States, 60%+ mobile share,
Bounce rate is less than 10%,
More than 30% of visits by men aged 18-25
and traffic spiked by 30%+ in the past year
Sales person:
500M
Websites
X
100+
Dimensions
X
50+
Countries
The internet
as measured
by Similarweb
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Billions of records to query & process for each report
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Requirements
● Zero operations
● Cost efficient
● High Availability
● Responsiveness (~seconds query time)
● Data is stored on S3
● Schema evolution
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Is it possible?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Amazon Athena
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Amazon Athena
Serverless
Just connect to an endpoint
and submit your queries
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Amazon Athena
Cost Effectiveness
Running 10k queries with
a monthly cost of 150$
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Amazon Athena
Fully automated
data discovery
using Glue Crawlers
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Amazon Athena
SQL support
Provide customers with rich features
(ordering, aggregations, analytic functions)
without any effort from our side
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Amazon Athena
Serverless Cost Effectiveness
Fully automated
data discovery
SQL support
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Preparations to production?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Production Readiness
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Production Readiness
● Out of the box limits are 5 concurrent queries per second
○ Soft limit - open a limit increase ticket to support
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Production Readiness
● Out of the box limits are 5 concurrent queries per second
○ Soft limit - open a limit increase ticket to support
● Workgroups - Control costs and limit parallelism per
business case
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Production Readiness
● Out of the box limits are 5 concurrent queries per second
○ Soft limit - open a limit increase ticket to support
● Workgroups - Control costs and limit parallelism per
business case
● Monitoring - A keep alive every second is not a good idea
(it cost us 1000$)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Production Readiness
● Out of the box limits are 5 concurrent queries per second
○ Soft limit - open a limit increase ticket to support
● Workgroups - Control costs and limit parallelism per
business case
● Monitoring - A keep alive every second is not a good idea
(it cost us 1000$)
● Disaster Recovery
○ Data - S3 Cross Region Replication - Provided by AWS
○ Metadata - You need to take care of it by yourself (Lambda, Crawlers)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Tips & Tricks for
Performance
(This would save you s*** tons of time & money)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Production Readiness
● Order your data
○ Columnar formats work best if you write the data ordered
by a commonly used filter key
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Production Readiness
● Order your data
○ Columnar formats work best if you write the data ordered
by a commonly used filter key
● Use Hive bucketing
○ Directs Athena to specific files instead of scanning a whole directory
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Production Readiness
● Order your data
○ Columnar formats work best if you write the data ordered
by a commonly used filter key
● Use Hive bucketing
○ Directs Athena to specific files instead of scanning a whole directory
● Use JDBC
○ Better than the API for large reports (over thousands of rows
returned to client)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Read more on our experience working & optimizing
Athena and other cool stuff we are doing at
similarweb.engineering
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Feel free to send me your CV at
ido.senesh@similarweb.com
or at linkedin.com/in/senesh
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
Thank you!
S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Roy Hasson
royon@amazon.com
@royhasson http://bit.ly/2SJ6WBa
Ido Senesh
ido.senesh@similarweb.com
linkedin.com/in/senesh
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI TS U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Contenu connexe

Tendances

Tendances (20)

Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
 
利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統
 
Microservices on AWS: Architectural Patterns and Best Practices | AWS Summit ...
Microservices on AWS: Architectural Patterns and Best Practices | AWS Summit ...Microservices on AWS: Architectural Patterns and Best Practices | AWS Summit ...
Microservices on AWS: Architectural Patterns and Best Practices | AWS Summit ...
 
Solutions for Storage and Data Migrations | AWS Summit Tel Aviv 2019
Solutions for Storage and Data Migrations | AWS Summit Tel Aviv 2019Solutions for Storage and Data Migrations | AWS Summit Tel Aviv 2019
Solutions for Storage and Data Migrations | AWS Summit Tel Aviv 2019
 
Using automation to drive continuous-compliance best practices - SEC208 - New...
Using automation to drive continuous-compliance best practices - SEC208 - New...Using automation to drive continuous-compliance best practices - SEC208 - New...
Using automation to drive continuous-compliance best practices - SEC208 - New...
 
CI/CD for Modern Applications
CI/CD for Modern ApplicationsCI/CD for Modern Applications
CI/CD for Modern Applications
 
Databases on AWS - The right tool for the right job - ADB203 - Santa Clara AW...
Databases on AWS - The right tool for the right job - ADB203 - Santa Clara AW...Databases on AWS - The right tool for the right job - ADB203 - Santa Clara AW...
Databases on AWS - The right tool for the right job - ADB203 - Santa Clara AW...
 
Building a fully serverless application on AWS | AWS Summit Tel Aviv 2019
Building a fully serverless application on AWS | AWS Summit Tel Aviv 2019Building a fully serverless application on AWS | AWS Summit Tel Aviv 2019
Building a fully serverless application on AWS | AWS Summit Tel Aviv 2019
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the Cloud
 
HK-AWS-Quick-Start-Workshop
HK-AWS-Quick-Start-WorkshopHK-AWS-Quick-Start-Workshop
HK-AWS-Quick-Start-Workshop
 
Twelve-Factor App Methodology and Modern Applications | AWS Summit Tel Aviv 2019
Twelve-Factor App Methodology and Modern Applications | AWS Summit Tel Aviv 2019Twelve-Factor App Methodology and Modern Applications | AWS Summit Tel Aviv 2019
Twelve-Factor App Methodology and Modern Applications | AWS Summit Tel Aviv 2019
 
Virtual AWSome Day October 2018 - Amazon Web Services
Virtual AWSome Day October 2018 - Amazon Web ServicesVirtual AWSome Day October 2018 - Amazon Web Services
Virtual AWSome Day October 2018 - Amazon Web Services
 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
 
[NEW LAUNCH] Introducing AWS Deep Learning Containers
[NEW LAUNCH] Introducing AWS Deep Learning Containers[NEW LAUNCH] Introducing AWS Deep Learning Containers
[NEW LAUNCH] Introducing AWS Deep Learning Containers
 
Progetta, crea e gestisci Modern Application per web e mobile su AWS
Progetta, crea e gestisci Modern Application per web e mobile su AWSProgetta, crea e gestisci Modern Application per web e mobile su AWS
Progetta, crea e gestisci Modern Application per web e mobile su AWS
 
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
 
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS SummitOptimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit
Optimize data lakes with Amazon S3 - STG302 - Santa Clara AWS Summit
 
Serverless data prep with AWS Glue - ADB306 - New York AWS Summit
Serverless data prep with AWS Glue - ADB306 - New York AWS SummitServerless data prep with AWS Glue - ADB306 - New York AWS Summit
Serverless data prep with AWS Glue - ADB306 - New York AWS Summit
 
Journey into the Cloud with VMware Cloud on AWS: Deep Dive - CMP303 - Anaheim...
Journey into the Cloud with VMware Cloud on AWS: Deep Dive - CMP303 - Anaheim...Journey into the Cloud with VMware Cloud on AWS: Deep Dive - CMP303 - Anaheim...
Journey into the Cloud with VMware Cloud on AWS: Deep Dive - CMP303 - Anaheim...
 
“Lift and shift” storage for business-critical applications - STG203 - New Yo...
“Lift and shift” storage for business-critical applications - STG203 - New Yo...“Lift and shift” storage for business-critical applications - STG203 - New Yo...
“Lift and shift” storage for business-critical applications - STG203 - New Yo...
 

Similaire à AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019

Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWS
Amazon Web Services
 

Similaire à AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019 (20)

How to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS SummitHow to go from zero to data lakes in days - ADB202 - New York AWS Summit
How to go from zero to data lakes in days - ADB202 - New York AWS Summit
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
 
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaImmersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time data
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
 
Building-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWSBuilding-Serverless-Analytics-On-AWS
Building-Serverless-Analytics-On-AWS
 
Modern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the CloudModern Data Platforms - Thinking Data Flywheel on the Cloud
Modern Data Platforms - Thinking Data Flywheel on the Cloud
 
From Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With DataFrom Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With Data
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWS
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWS
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
 
Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven DecisionsLeveraging Data Analytics in the Cloud to Support Data-Driven Decisions
Leveraging Data Analytics in the Cloud to Support Data-Driven Decisions
 
AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...
AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...
AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in...
 
Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWS
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 

Plus de AWS Summits

Plus de AWS Summits (20)

AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...
AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...
AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...
 
AWS Summit Singapore 2019 | Bridging Start-ups and Enterprises
AWS Summit Singapore 2019 | Bridging Start-ups and EnterprisesAWS Summit Singapore 2019 | Bridging Start-ups and Enterprises
AWS Summit Singapore 2019 | Bridging Start-ups and Enterprises
 
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and Tricks
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and TricksAWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and Tricks
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and Tricks
 
AWS Summit Singapore 2019 | Five Common Technical Challenges for Startups
AWS Summit Singapore 2019 | Five Common Technical Challenges for StartupsAWS Summit Singapore 2019 | Five Common Technical Challenges for Startups
AWS Summit Singapore 2019 | Five Common Technical Challenges for Startups
 
AWS Summit Singapore 2019 | A Founder's Journey to Exit
AWS Summit Singapore 2019 | A Founder's Journey to ExitAWS Summit Singapore 2019 | A Founder's Journey to Exit
AWS Summit Singapore 2019 | A Founder's Journey to Exit
 
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics Services
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics ServicesAWS Summit Singapore 2019 | Realising Business Value with AWS Analytics Services
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics Services
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
 
AWS Summit Singapore 2019 | Amazon Digital User Engagement Solutions
AWS Summit Singapore 2019 | Amazon Digital User Engagement SolutionsAWS Summit Singapore 2019 | Amazon Digital User Engagement Solutions
AWS Summit Singapore 2019 | Amazon Digital User Engagement Solutions
 
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWSAWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
 
AWS Summit Singapore 2019 | Microsoft DevOps on AWS
AWS Summit Singapore 2019 | Microsoft DevOps on AWSAWS Summit Singapore 2019 | Microsoft DevOps on AWS
AWS Summit Singapore 2019 | Microsoft DevOps on AWS
 
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...
 
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
 
AWS Summit Singapore 2019 | Operating Microservices at Hyperscale
AWS Summit Singapore 2019 | Operating Microservices at HyperscaleAWS Summit Singapore 2019 | Operating Microservices at Hyperscale
AWS Summit Singapore 2019 | Operating Microservices at Hyperscale
 
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes WorkloadsAWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
 
AWS Summit Singapore 2019 | Realising Business Value
AWS Summit Singapore 2019 | Realising Business ValueAWS Summit Singapore 2019 | Realising Business Value
AWS Summit Singapore 2019 | Realising Business Value
 
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
 
AWS Summit Singapore 2019 | Transformation Towards a Digital Native Enterprise
AWS Summit Singapore 2019 | Transformation Towards a Digital Native EnterpriseAWS Summit Singapore 2019 | Transformation Towards a Digital Native Enterprise
AWS Summit Singapore 2019 | Transformation Towards a Digital Native Enterprise
 
AWS Summit Singapore 2019 | Pragmatic Container Security
AWS Summit Singapore 2019 | Pragmatic Container SecurityAWS Summit Singapore 2019 | Pragmatic Container Security
AWS Summit Singapore 2019 | Pragmatic Container Security
 
AWS Summit Singapore 2019 | Enterprise Migration Journey Roadmap
AWS Summit Singapore 2019 | Enterprise Migration Journey RoadmapAWS Summit Singapore 2019 | Enterprise Migration Journey Roadmap
AWS Summit Singapore 2019 | Enterprise Migration Journey Roadmap
 
AWS Summit Singapore 2019 | VMware: The Fastest Path to Hybrid Cloud
AWS Summit Singapore 2019 | VMware: The Fastest Path to Hybrid CloudAWS Summit Singapore 2019 | VMware: The Fastest Path to Hybrid Cloud
AWS Summit Singapore 2019 | VMware: The Fastest Path to Hybrid Cloud
 

AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T AWS Analytics Services - When to use what? With SimilarWeb Roy Hasson Business Development Lead – Analytics and Data Lakes Amazon Web Services D A T 2 0 1 Ido Senesh Sr. Software Engineer SimilarWeb
  • 2. S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Data every 5 years There is more data than people think 15 years live for Data platforms need to 1,000x scale >10x grows Modern Data Challenges
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T There are more people accessing data And more requirements for making data available Data Scientists Analysts Business Users Applications Secure Real time Flexible Scalable Modern Data Challenges
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Democratization of data Governance & control There are more people working with data than ever before How do I provide democratized access to data to enable informed decisions while at the same time enforce data governance and prevent mismanagement of the data? Modern Data Challenges
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T AWS databases and analytics Broad and deep portfolio, built for builders AWS Marketplace Amazon Redshift Data warehousing Amazon EMR Hadoop + Spark Athena Interactive analytics Kinesis Analytics Real-time Amazon Elasticsearch service Operational Analytics RDS MySQL, PostgreSQL, MariaDB, Oracle, SQL Server Aurora MySQL, PostgreSQL Amazon QuickSight Amazon SageMaker DynamoDB Key value, Document ElastiCache Redis, Memcached Neptune Graph Timestream Time Series QLDB Ledger Database S3/Amazon Glacier AWS Glue ETL & Data Catalog Lake Formation Data Lakes Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Data Pipeline | Direct Connect Data Movement AnalyticsDatabases Business Intelligence & Machine Learning Data Lake Managed Blockchain Blockchain Templates Blockchain Amazon Comprehend Amazon Rekognition Amazon Lex Amazon Transcribe AWS DeepLens 250+ solutions 730+ Database solutions 600+ Analytics solutions 25+ Blockchain solutions 20+ Data lake solutions 30+ solutions RDS on VMWare
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T More data lakes & analytics on AWS than anywhere else
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Typical steps of building a data lake Setup Storage1 Move data2 Cleanse, prep, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics 5
  • 10. S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Data Streams
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Managed Streaming for Kafka
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Data Firehose
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Devices Web Sensors Social EDW S3://bucket/year=yyyy/month=mm/file.parquet S3://bucket/year=yyyy/month=mm/file.orc Real-time data analysis with Amazon Kinesis Data Analytics Ingest streaming events in real time with Amazon Kinesis Output streaming data to select destinations. Optimize file format Take action Ingestion: Streaming Events
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Database Migration Service
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue ETL
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Ingestion: Database and Data Warehouse Devices Web Sensors Social EDW S3://bucket/table/LOAD001.csv S3://bucket/table/20181127-1134010000.csv S3://bucket/year=yyyy/month=mm/file.parquet S3://bucket/year=yyyy/month=mm/file.orc S3://bucket/year=yyyy/month=mm/file.parquet S3://bucket/year=yyyy/month=mm/file.orc
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Ingestion: How to choose Event Stream Batch Operation Database Source CDC Persist Data Real-time Analytics Open Source Seamless Scaling Y N Y NNY N Y YSnapshot Incremental N Amazon DMS Y Y N Amazon MSK Y
  • 19. S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Unified metadata repository across relational databases, Amazon RDS, Amazon Redshift, and Amazon S3. Single searchable view into your data, no matter where it is stored Ability to automatically crawl and classify your data Augment technical metadata with business metadata for tables Manage access to data using Fine Grain Access Controls. Even finer with AWS Lake Formation Apache Hive metastore compatible and integrated with AWS Analytics services AWS Glue Data Catalog Search and explore available data
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Crawlers automatically build your Data Catalog and keep it in sync. Automatically discover new data, extracts schema definitions Detect schema changes and version tables Detect Hive style partitions on Amazon S3 Built-in classifiers for popular types; custom classifiers using Grok expression Run ad hoc or on a schedule; serverless – only pay when crawler runs AWS Glue Crawlers Crawlers Automatically catalog your data
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T AWS Lake Formation (join the preview) Build, secure, and manage a data lake in days Build a data lake in days, not months Build and deploy a fully managed data lake with a few clicks Enforce security policies across multiple services Centrally define security, governance, and auditing policies in one place and enforce those policies for all users and all applications Combine different analytics approaches Empower analyst and data scientist productivity, giving them self- service discovery and safe access to all data from a single catalog
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Hive Metastore Client for AWS Glue Data Catalog • Connect Hive-Metastore compatible platforms to AWS Glue Data Catalog • Apache Hive 2.x compatible • Apache 2.0 license https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Data Catalog AWS Glue Data Catalog Crawl data sources, catalog schema & partitions Connect Hive compatible sources via open connector Search and discover data in your data lake Integrated AWS Analytics tools
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Catalog: How to choose Auto- discovery AWS Analytics Integration Y YY Governance FGAC AWS Lake Formation DC Managed Hadoop Hive Metastore RDS N N GDC Open Connector Y
  • 26. S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue ETL
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EMR
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Redshift
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Data Transformation Structured and unstructured data available in raw S3 bucket Other real- time streaming sources Sometimes ELT is a better option Transformed S3 bucket for querying
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Process: How to choose Consistent schema/big data Cluster customization Y YServerless N SQL based transforms Y N Transactional Y Variable schema/sm all data Y Y <15min job Y N N Apache Spark Python Shell
  • 32. S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena Permissions Data Lake AWS Cloud AWS Cloud Reporting & Analytics Machine Learning AWS Cloud Custom Applications AWS Glue Data Catalog
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Amazon EMR Notebooks in the Console A managed analytics environment based on Jupyter Notebooks Amazon EMR clusters AWS Management Console for EMR EMR-managed notebook based on Jupyter notebook users Auto saves notebook file to your S3 bucket Run queries on your remote EMR cluster EMR VPC Customer VPC
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Elasticsearch Service
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon QuickSight
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Serve and Consume Permissions Data Lake AWS Glue Data Catalog AWS Cloud AWS Cloud Reporting & Analytics Machine Learning AWS Cloud Custom Applications
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Serve: How to choose Interactive query Free-form search Y Ymili-sec response N Serverless N Interactive code Y Y N Y Repeated queries Y
  • 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Build with flexibility in mind Open Source Secure IntegratedManaged & Elastic
  • 40. S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. A queryable interface for the entire goddamn internet SimilarWeb’s Lead Generator
  • 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T SimilarWeb gives you digital market intelligence for every website and mobile app worldwide to understand, track and grow your market share. SimilarWeb’s Mission
  • 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T 500M Websites Some Numbers 100+ Dimensions 50+ Countries
  • 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T The Usecase Show me all websites with 50M+ Monthly visits from United States, 60%+ mobile share, Bounce rate is less than 10%, More than 30% of visits by men aged 18-25 and traffic spiked by 30%+ in the past year Sales person: 500M Websites X 100+ Dimensions X 50+ Countries The internet as measured by Similarweb
  • 44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Billions of records to query & process for each report
  • 45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Requirements ● Zero operations ● Cost efficient ● High Availability ● Responsiveness (~seconds query time) ● Data is stored on S3 ● Schema evolution
  • 46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Is it possible?
  • 47. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Amazon Athena
  • 48. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Amazon Athena Serverless Just connect to an endpoint and submit your queries
  • 49. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Amazon Athena Cost Effectiveness Running 10k queries with a monthly cost of 150$
  • 50. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Amazon Athena Fully automated data discovery using Glue Crawlers
  • 51. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Amazon Athena SQL support Provide customers with rich features (ordering, aggregations, analytic functions) without any effort from our side
  • 52. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Amazon Athena Serverless Cost Effectiveness Fully automated data discovery SQL support
  • 53. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Preparations to production?
  • 54. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Production Readiness
  • 55. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Production Readiness ● Out of the box limits are 5 concurrent queries per second ○ Soft limit - open a limit increase ticket to support
  • 56. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Production Readiness ● Out of the box limits are 5 concurrent queries per second ○ Soft limit - open a limit increase ticket to support ● Workgroups - Control costs and limit parallelism per business case
  • 57. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Production Readiness ● Out of the box limits are 5 concurrent queries per second ○ Soft limit - open a limit increase ticket to support ● Workgroups - Control costs and limit parallelism per business case ● Monitoring - A keep alive every second is not a good idea (it cost us 1000$)
  • 58. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Production Readiness ● Out of the box limits are 5 concurrent queries per second ○ Soft limit - open a limit increase ticket to support ● Workgroups - Control costs and limit parallelism per business case ● Monitoring - A keep alive every second is not a good idea (it cost us 1000$) ● Disaster Recovery ○ Data - S3 Cross Region Replication - Provided by AWS ○ Metadata - You need to take care of it by yourself (Lambda, Crawlers)
  • 59. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Tips & Tricks for Performance (This would save you s*** tons of time & money)
  • 60. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Production Readiness ● Order your data ○ Columnar formats work best if you write the data ordered by a commonly used filter key
  • 61. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Production Readiness ● Order your data ○ Columnar formats work best if you write the data ordered by a commonly used filter key ● Use Hive bucketing ○ Directs Athena to specific files instead of scanning a whole directory
  • 62. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Production Readiness ● Order your data ○ Columnar formats work best if you write the data ordered by a commonly used filter key ● Use Hive bucketing ○ Directs Athena to specific files instead of scanning a whole directory ● Use JDBC ○ Better than the API for large reports (over thousands of rows returned to client)
  • 63. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Read more on our experience working & optimizing Athena and other cool stuff we are doing at similarweb.engineering
  • 64. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Feel free to send me your CV at ido.senesh@similarweb.com or at linkedin.com/in/senesh
  • 65. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T
  • 66. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI T Thank you! S U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Roy Hasson royon@amazon.com @royhasson http://bit.ly/2SJ6WBa Ido Senesh ido.senesh@similarweb.com linkedin.com/in/senesh
  • 67. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U MMI TS U MMI T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.