SlideShare une entreprise Scribd logo
1  sur  32
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rahul Bhartia, Principal Product Manager, S3
March 29th 2018
Transforming Data Lakes
with
Amazon S3 Select & Amazon Glacier Select
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What to expect from the session
1.Introduction
2.Use-cases
3.Key features
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Unmatched durability,
availability, and scalability
Best security, compliance, and audit capability Object-level control
at any scale
Business insight into your
data
Twice as many partner integrationsMost ways to bring
data in
Building a data lake with Amazon S3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
With a wide variety of in-place tools…
Amazon Athena Amazon Redshift
Spectrum
Amazon EMR AWS Glue
Amazon S3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Today: Compute scales based…
on object size instead of the amount of data you want to process
DATA COMPUTE
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Today: All of these tools…
retrieve a lot of data they don’t need and
do the heavy lifting
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
You have a choice of storage classes
Archive data Infrequently accessed data
Minutes to hours Milliseconds
0.4¢-GB/mo. 1.25¢-GB/mo.
Amazon S3
Standard
Amazon S3
Standard– Infrequent Access
Amazon Glacier
Active data
Milliseconds
From 2.1¢-GB/mo.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Today: You need to….
entire object from Amazon Glacier to Amazon S3
and then use it.
Amazon S3Amazon
Glacier
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Introducing…
Amazon S3 Select and Amazon Glacier Select
Select subset of data from an object based on a SQL expression
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Simple, faster, and cheaper!
Available as an API—no
infrastructure or
administration
Faster performance as
compared to doing it yourself
Pay as you go. The less you
retrieve the more you save.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Select
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Select
Simple to use
Standard SQL expression
Familiar
Work and scales like GET
requests
Integrated
AWS SDK and Presto
(others coming soon)
Select contents from object instead of retrieving the object
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Select
Output
Format: delimited text (CSV, TSV),
JSON …
Clauses Data types Operators Functions
Select String Conditional String
From Integer, Float, Decimal Math Cast
Where Timestamp Logical Math
Boolean String (Like, ||) Aggregate
Input
Format: delimited text (CSV, TSV),
JSON …
Compression: GZIP …
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Select: Simple pattern matches
…get-object …object… | awk -F ’{ if($4=="x") print $1}’
...select-object …object… ‘SELECT o._1 WHERE o._4 == “x”…’
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Select: Serverless applications
Amazon
S3
AWS
Lambda
Amazon
SNS
S3
Select
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
After
200 seconds and 11.2 cents
# Download and process all keys
for key in src_keys:
response = s3_client.get_object(Bucket=src_bucket,
Key=key)
contents = response['Body'].read()
for line in contents.split('n')[:-1]:
line_count +=1
try:
data = line.split(',')
srcIp = data[0][:8]
….
Amazon S3 Select: Serverless MapReduce
Before
95 seconds and costs 2.8 cents
# Select IP Address and Keys
for key in src_keys:
response = s3_client.select_object_content
(Bucket=src_bucket, Key=key, expression =
SELECT SUBSTR(obj._1, 1, 8), obj._2 FROM s3object as
obj)
contents = response['Body'].read()
for line in contents:
line_count +=1
try:
….
2X Faster at 1/5 of the cost
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Up to 400% Faster
Up to 80% Cheaper
Amazon S3 Select: Accelerating Big Data
Amazon S3
Before:
Amazon S3
S3 Select
After:
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DEMO: Amazon S3 Select with Presto
Works with your existing Hive Metastore
Automatically converts predicates into S3 Select requests
Amazon S3
S3 Select
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Before
Amazon S3 Select: Accelerating big data
After
After
5X Faster with 1/40 of the CPU
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Select: Will be supported by…
Amazon Athena Amazon EMR Amazon Redshift
Spectrum
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Select: In Preview
• Formats: CSV, JSON
• Compression: GZIP
• Encryption: None
• Encoding: UTF-8
• Integration: AWS SDK for Java and Python and Presto Connector
• Availability: Northern Virginia, Ohio, Oregon, Dublin, and Singapore
Apply at: https://pages.awscloud.com/amazon-s3-select-preview.html
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Glacier Select
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Glacier Select
Simple SQL Expression
SELECT and WHERE
Familiar semantics
Work and scales like
RESTORE requests
Integrated
AWS SDK and CLI
Restore selective contents instead of restoring entire object
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Glacier Select
Input
Format: delimited text (CSV, TSV,
PSV, etc.)
Encryption: SSE-KMS, SSE-S3
Output
Format: delimited text (CSV, TSV,
PSV, etc.)
Clauses Data types Operators Functions
Select String Conditional String
From Integer, Float, Decimal Math Cast
Where Timestamp Logical
Boolean String (Like, ||)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Two ways to use Amazon Glacier Select
Using Glacier API
Data directly uploaded to Amazon Glacier
How to use Amazon Glacier Select?
Using S3 API
For data that is lifecycled to Amazon Glacier
from S3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to use Amazon Glacier Select?
Object Tier
SQL query Output S3 location SNS topic
Current restore-object API arguments
New (optional) restore-object API arguments to use Glacier Select
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using Glacier Select
…restore-object …object… | …get-object … object …. | awk -F ’{ if($4==“id") print $1}’
...restore-object …object… ‘SELECT o._1 WHERE o._4 == “id”…’ | …get-object … object ….
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Glacier Select: GA
• Formats: CSV, Any delimiter separated file
• Encryption: SSE- KMS, SSE-S3
• Encoding: UTF-8
• Integration: AWS SDK, CLI, Athena integration (expected 2018)
• Availability: All commercial regions where Amazon Glacier is launched
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Summary
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Select and Amazon Glacier Select
Select subset of data from an object based on a SQL expression
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Now: Your data lake on AWS
Simple Faster Cheaper
Amazon
Glacier
Amazon
S3
Amazon Redshift
Spectrum
Amazon Athena Amazon EMR
AWS
Lambda
ISVs and Custom
Applications
SELECT
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you

Contenu connexe

Tendances

Develop Containerized Apps with AWS Fargate
Develop Containerized Apps with AWS Fargate Develop Containerized Apps with AWS Fargate
Develop Containerized Apps with AWS Fargate Amazon Web Services
 
Mastering Identity at Every Layer of the Cake (SEC401-R1) - AWS re:Invent 2018
Mastering Identity at Every Layer of the Cake (SEC401-R1) - AWS re:Invent 2018Mastering Identity at Every Layer of the Cake (SEC401-R1) - AWS re:Invent 2018
Mastering Identity at Every Layer of the Cake (SEC401-R1) - AWS re:Invent 2018Amazon Web Services
 
Module 1 - AWSome Day Online Conference 2018
Module 1 - AWSome Day Online Conference 2018Module 1 - AWSome Day Online Conference 2018
Module 1 - AWSome Day Online Conference 2018Amazon Web Services
 
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Amazon Web Services
 
Module 2 - AWSome Day Online Conference 2018
Module 2 - AWSome Day Online Conference 2018Module 2 - AWSome Day Online Conference 2018
Module 2 - AWSome Day Online Conference 2018Amazon Web Services
 
[NEW LAUNCH!] Lambda Layers (SRV375) - AWS re:Invent 2018
[NEW LAUNCH!] Lambda Layers (SRV375) - AWS re:Invent 2018[NEW LAUNCH!] Lambda Layers (SRV375) - AWS re:Invent 2018
[NEW LAUNCH!] Lambda Layers (SRV375) - AWS re:Invent 2018Amazon Web Services
 
SRV313 Introduction to Building Web Apps on AWS
 SRV313 Introduction to Building Web Apps on AWS SRV313 Introduction to Building Web Apps on AWS
SRV313 Introduction to Building Web Apps on AWSAmazon Web Services
 
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018Boaz Ziniman
 
Module 4 - AWSome Day Online Conference 2018
Module 4 - AWSome Day Online Conference 2018Module 4 - AWSome Day Online Conference 2018
Module 4 - AWSome Day Online Conference 2018Amazon Web Services
 
Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018
Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018
Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018Amazon Web Services
 
Migrazione di Database e Data Warehouse su AWS
Migrazione di Database e Data Warehouse su AWSMigrazione di Database e Data Warehouse su AWS
Migrazione di Database e Data Warehouse su AWSAmazon Web Services
 
Running Mission Critical Workloads on AWS
Running Mission Critical Workloads on AWSRunning Mission Critical Workloads on AWS
Running Mission Critical Workloads on AWSAmazon Web Services
 
How to Secure Sensitive Customer Data Using Amazon CloudFront - AWS Online Te...
How to Secure Sensitive Customer Data Using Amazon CloudFront - AWS Online Te...How to Secure Sensitive Customer Data Using Amazon CloudFront - AWS Online Te...
How to Secure Sensitive Customer Data Using Amazon CloudFront - AWS Online Te...Amazon Web Services
 
SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway
 SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway
SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage GatewayAmazon Web Services
 
Create and Publish AR and VR Apps with Amazon Sumerian
Create and Publish AR and VR Apps with Amazon SumerianCreate and Publish AR and VR Apps with Amazon Sumerian
Create and Publish AR and VR Apps with Amazon SumerianAmazon Web Services
 
[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...
[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...
[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...Amazon Web Services
 
AWSome Day Online Conference 2018 - Module 2
AWSome Day Online Conference 2018 -  Module 2AWSome Day Online Conference 2018 -  Module 2
AWSome Day Online Conference 2018 - Module 2Amazon Web Services
 
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAccelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAmazon Web Services
 
Amazon S3_Updates and Best Practices
Amazon S3_Updates and Best Practices Amazon S3_Updates and Best Practices
Amazon S3_Updates and Best Practices Amazon Web Services
 

Tendances (20)

Develop Containerized Apps with AWS Fargate
Develop Containerized Apps with AWS Fargate Develop Containerized Apps with AWS Fargate
Develop Containerized Apps with AWS Fargate
 
Mastering Identity at Every Layer of the Cake (SEC401-R1) - AWS re:Invent 2018
Mastering Identity at Every Layer of the Cake (SEC401-R1) - AWS re:Invent 2018Mastering Identity at Every Layer of the Cake (SEC401-R1) - AWS re:Invent 2018
Mastering Identity at Every Layer of the Cake (SEC401-R1) - AWS re:Invent 2018
 
Module 1 - AWSome Day Online Conference 2018
Module 1 - AWSome Day Online Conference 2018Module 1 - AWSome Day Online Conference 2018
Module 1 - AWSome Day Online Conference 2018
 
SRV303 Deep Dive on Amazon EFS
 SRV303 Deep Dive on Amazon EFS SRV303 Deep Dive on Amazon EFS
SRV303 Deep Dive on Amazon EFS
 
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
 
Module 2 - AWSome Day Online Conference 2018
Module 2 - AWSome Day Online Conference 2018Module 2 - AWSome Day Online Conference 2018
Module 2 - AWSome Day Online Conference 2018
 
[NEW LAUNCH!] Lambda Layers (SRV375) - AWS re:Invent 2018
[NEW LAUNCH!] Lambda Layers (SRV375) - AWS re:Invent 2018[NEW LAUNCH!] Lambda Layers (SRV375) - AWS re:Invent 2018
[NEW LAUNCH!] Lambda Layers (SRV375) - AWS re:Invent 2018
 
SRV313 Introduction to Building Web Apps on AWS
 SRV313 Introduction to Building Web Apps on AWS SRV313 Introduction to Building Web Apps on AWS
SRV313 Introduction to Building Web Apps on AWS
 
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
 
Module 4 - AWSome Day Online Conference 2018
Module 4 - AWSome Day Online Conference 2018Module 4 - AWSome Day Online Conference 2018
Module 4 - AWSome Day Online Conference 2018
 
Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018
Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018
Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018
 
Migrazione di Database e Data Warehouse su AWS
Migrazione di Database e Data Warehouse su AWSMigrazione di Database e Data Warehouse su AWS
Migrazione di Database e Data Warehouse su AWS
 
Running Mission Critical Workloads on AWS
Running Mission Critical Workloads on AWSRunning Mission Critical Workloads on AWS
Running Mission Critical Workloads on AWS
 
How to Secure Sensitive Customer Data Using Amazon CloudFront - AWS Online Te...
How to Secure Sensitive Customer Data Using Amazon CloudFront - AWS Online Te...How to Secure Sensitive Customer Data Using Amazon CloudFront - AWS Online Te...
How to Secure Sensitive Customer Data Using Amazon CloudFront - AWS Online Te...
 
SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway
 SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway
SRV302 Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway
 
Create and Publish AR and VR Apps with Amazon Sumerian
Create and Publish AR and VR Apps with Amazon SumerianCreate and Publish AR and VR Apps with Amazon Sumerian
Create and Publish AR and VR Apps with Amazon Sumerian
 
[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...
[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...
[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...
 
AWSome Day Online Conference 2018 - Module 2
AWSome Day Online Conference 2018 -  Module 2AWSome Day Online Conference 2018 -  Module 2
AWSome Day Online Conference 2018 - Module 2
 
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAccelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
 
Amazon S3_Updates and Best Practices
Amazon S3_Updates and Best Practices Amazon S3_Updates and Best Practices
Amazon S3_Updates and Best Practices
 

Similaire à Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS Online Tech Talks

Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Amazon Web Services
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfSasikumarPalanivel3
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfsaidbilgen
 
SRV208 S3 One Zone-IA and S3 Select GA
SRV208 S3 One Zone-IA and S3 Select GASRV208 S3 One Zone-IA and S3 Select GA
SRV208 S3 One Zone-IA and S3 Select GAAmazon Web Services
 
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Amazon Web Services
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services
 
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksAmazon Web Services
 
Deep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech Talks
Deep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech TalksDeep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech Talks
Deep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech TalksAmazon Web Services
 
Amazon S3: Updates and Best Practices - SRV301 - Chicago AWS Summit
Amazon S3: Updates and Best Practices - SRV301 - Chicago AWS SummitAmazon S3: Updates and Best Practices - SRV301 - Chicago AWS Summit
Amazon S3: Updates and Best Practices - SRV301 - Chicago AWS SummitAmazon Web Services
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceAWS Germany
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...Amazon Web Services
 
Introducing S3 Batch Operations: Managing Billions of Objects in Amazon S3 at...
Introducing S3 Batch Operations: Managing Billions of Objects in Amazon S3 at...Introducing S3 Batch Operations: Managing Billions of Objects in Amazon S3 at...
Introducing S3 Batch Operations: Managing Billions of Objects in Amazon S3 at...Amazon Web Services
 

Similaire à Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS Online Tech Talks (20)

Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
 
SRV208 S3 One Zone-IA and S3 Select GA
SRV208 S3 One Zone-IA and S3 Select GASRV208 S3 One Zone-IA and S3 Select GA
SRV208 S3 One Zone-IA and S3 Select GA
 
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select...
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
 
Deep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech Talks
Deep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech TalksDeep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech Talks
Deep Dive on New Features in Amazon S3 & Glacier - AWS Online Tech Talks
 
Amazon S3: Updates and Best Practices - SRV301 - Chicago AWS Summit
Amazon S3: Updates and Best Practices - SRV301 - Chicago AWS SummitAmazon S3: Updates and Best Practices - SRV301 - Chicago AWS Summit
Amazon S3: Updates and Best Practices - SRV301 - Chicago AWS Summit
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performance
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...
 
Introducing S3 Batch Operations: Managing Billions of Objects in Amazon S3 at...
Introducing S3 Batch Operations: Managing Billions of Objects in Amazon S3 at...Introducing S3 Batch Operations: Managing Billions of Objects in Amazon S3 at...
Introducing S3 Batch Operations: Managing Billions of Objects in Amazon S3 at...
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS Online Tech Talks

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rahul Bhartia, Principal Product Manager, S3 March 29th 2018 Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What to expect from the session 1.Introduction 2.Use-cases 3.Key features
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Unmatched durability, availability, and scalability Best security, compliance, and audit capability Object-level control at any scale Business insight into your data Twice as many partner integrationsMost ways to bring data in Building a data lake with Amazon S3
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. With a wide variety of in-place tools… Amazon Athena Amazon Redshift Spectrum Amazon EMR AWS Glue Amazon S3
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today: Compute scales based… on object size instead of the amount of data you want to process DATA COMPUTE
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today: All of these tools… retrieve a lot of data they don’t need and do the heavy lifting
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. You have a choice of storage classes Archive data Infrequently accessed data Minutes to hours Milliseconds 0.4¢-GB/mo. 1.25¢-GB/mo. Amazon S3 Standard Amazon S3 Standard– Infrequent Access Amazon Glacier Active data Milliseconds From 2.1¢-GB/mo.
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today: You need to…. entire object from Amazon Glacier to Amazon S3 and then use it. Amazon S3Amazon Glacier
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introducing… Amazon S3 Select and Amazon Glacier Select Select subset of data from an object based on a SQL expression
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simple, faster, and cheaper! Available as an API—no infrastructure or administration Faster performance as compared to doing it yourself Pay as you go. The less you retrieve the more you save.
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select Simple to use Standard SQL expression Familiar Work and scales like GET requests Integrated AWS SDK and Presto (others coming soon) Select contents from object instead of retrieving the object
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select Output Format: delimited text (CSV, TSV), JSON … Clauses Data types Operators Functions Select String Conditional String From Integer, Float, Decimal Math Cast Where Timestamp Logical Math Boolean String (Like, ||) Aggregate Input Format: delimited text (CSV, TSV), JSON … Compression: GZIP …
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Simple pattern matches …get-object …object… | awk -F ’{ if($4=="x") print $1}’ ...select-object …object… ‘SELECT o._1 WHERE o._4 == “x”…’
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Serverless applications Amazon S3 AWS Lambda Amazon SNS S3 Select
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. After 200 seconds and 11.2 cents # Download and process all keys for key in src_keys: response = s3_client.get_object(Bucket=src_bucket, Key=key) contents = response['Body'].read() for line in contents.split('n')[:-1]: line_count +=1 try: data = line.split(',') srcIp = data[0][:8] …. Amazon S3 Select: Serverless MapReduce Before 95 seconds and costs 2.8 cents # Select IP Address and Keys for key in src_keys: response = s3_client.select_object_content (Bucket=src_bucket, Key=key, expression = SELECT SUBSTR(obj._1, 1, 8), obj._2 FROM s3object as obj) contents = response['Body'].read() for line in contents: line_count +=1 try: …. 2X Faster at 1/5 of the cost
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Up to 400% Faster Up to 80% Cheaper Amazon S3 Select: Accelerating Big Data Amazon S3 Before: Amazon S3 S3 Select After:
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DEMO: Amazon S3 Select with Presto Works with your existing Hive Metastore Automatically converts predicates into S3 Select requests Amazon S3 S3 Select
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Before Amazon S3 Select: Accelerating big data After After 5X Faster with 1/40 of the CPU
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Will be supported by… Amazon Athena Amazon EMR Amazon Redshift Spectrum
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: In Preview • Formats: CSV, JSON • Compression: GZIP • Encryption: None • Encoding: UTF-8 • Integration: AWS SDK for Java and Python and Presto Connector • Availability: Northern Virginia, Ohio, Oregon, Dublin, and Singapore Apply at: https://pages.awscloud.com/amazon-s3-select-preview.html
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select Simple SQL Expression SELECT and WHERE Familiar semantics Work and scales like RESTORE requests Integrated AWS SDK and CLI Restore selective contents instead of restoring entire object
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select Input Format: delimited text (CSV, TSV, PSV, etc.) Encryption: SSE-KMS, SSE-S3 Output Format: delimited text (CSV, TSV, PSV, etc.) Clauses Data types Operators Functions Select String Conditional String From Integer, Float, Decimal Math Cast Where Timestamp Logical Boolean String (Like, ||)
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Two ways to use Amazon Glacier Select Using Glacier API Data directly uploaded to Amazon Glacier How to use Amazon Glacier Select? Using S3 API For data that is lifecycled to Amazon Glacier from S3
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to use Amazon Glacier Select? Object Tier SQL query Output S3 location SNS topic Current restore-object API arguments New (optional) restore-object API arguments to use Glacier Select
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using Glacier Select …restore-object …object… | …get-object … object …. | awk -F ’{ if($4==“id") print $1}’ ...restore-object …object… ‘SELECT o._1 WHERE o._4 == “id”…’ | …get-object … object ….
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select: GA • Formats: CSV, Any delimiter separated file • Encryption: SSE- KMS, SSE-S3 • Encoding: UTF-8 • Integration: AWS SDK, CLI, Athena integration (expected 2018) • Availability: All commercial regions where Amazon Glacier is launched
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Summary
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select and Amazon Glacier Select Select subset of data from an object based on a SQL expression
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Now: Your data lake on AWS Simple Faster Cheaper Amazon Glacier Amazon S3 Amazon Redshift Spectrum Amazon Athena Amazon EMR AWS Lambda ISVs and Custom Applications SELECT
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you