SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Dive on Amazon S3
and Glacier Architecture
C r a i g C o t t o n , D i r e c t o r P r o d u c t M a n a g e m e n t – A m a z o n S 3
H e n r y Z h a n g , S e n i o r P r o d u c t M a n a g e r – G l a c i e r
J a m a l M a z h a r , H e a d o f I n f r a s t r u c t u r e a n d D e v O p s – S p r i n k l r
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AGENDA
• Deep dive on Amazon S3 architecture
• Deep dive on Glacier architecture
• Guest Speaker: Jamal Mazhar, Head of Infrastructure and DevOps
@ Sprinklr
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The AWS Storage Portfolio
Data Transfer
3rd Party
Connectors
S3 Transfer
Acceleration
File
Amazon EFS
Object
Amazon GlacierAmazon S3
Block
Amazon EBS
(persistent)
Amazon EC2
Instance Store
(ephemeral)
AWS
Snow Family
AWS Storage
Gateway
AWS Direct
Connect
Amazon
Kinesis
EFS
File Sync
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of Amazon S3 & Glacier
Durable, Available, & Scalable Security & Compliance Query In Place
Flexible Management Ecosystem
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Architecture Deep Dive
Amazon S3
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 By The Numbers
44 Availability Zones
(16 more coming in 2018)
16 Regions
(5 more coming in 2018)
Trillions of
objects
Millions of requests
per second
One of first three
AWS Services
(2006)
99.999999999%
Durability
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Architecture
Internet
End
User
PUT
GET
DELETE
Load
Balancers
Metadata
Storage
API
Servers
Blob Storage
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Availability Zones
S3 stores data in at least 3
Availability Zones (AZ’s)
Each AZ can be up to 8
physical data centers
Unavailability of a data center
or an AZ does not impact
overall S3 availability
Low latency private
network connect data
centers and AZ’s
Physically separate – even
extremely uncommon disasters
would only affect a single AZ
Data is automatically distributed
across a minimum of 3 AZ’s GEO
separated within an AWS Region
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Storage Classes
& T ransitions
S3 Standard S3 Standard –
Infrequent Access
Amazon Glacier
Active data
Synchronous access
Milliseconds retrieval
2.1¢-GB/mo
Archive data
Asynchronous access
Minutes-to-hours retrieval
0.4¢-GB/mo
Infrequently accessed data
Synchronous access
Milliseconds retrieval
1.25¢-GB/mo
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 Security, Encryption & Compliance
T he b roade st se t of tools in the indu stry
Security
• IAM and Bucket Policies
• Access Control Lists
• Audit logging with CloudTrail
& Alerts with CloudWatch
• Secure CloudFormation
templates
• Amazon Macie
• S3 Console Permission Checks
Encryption
• Encryption in transit with TLS
• SSE-S3 – Amazon S3 manages
data & keys
• SSE-C – Customer managed keys
• SSE-KMS – Master keys in KMS
• CSE – 100% Customer managed
• Default Bucket Encryption
• Encryption Status in Inventory
Compliance
• PCI-DSS
• HIPAA/HITECH
• FedRAMP
• FISMA
• EU Data Protection
Directive
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cross-region Replication
Automatically replicate data to any other AWS Region
• Replicate by object, bucket, or prefix
• Support for SSE-KMS encrypted objects
• Ownership overwrite
• Change the object owner in the destination region
Region A Region B
Cross-region connectivity
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cross Region Replication Examples
S3 Standard S3 Standard S3 Standard S-IA
S3 Standard Glacier
Zero-day Lifecycle
Policy to Glacier
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Do More With Your In-place Data
• Athena
• Redshift Spectrum
• QuickSight
• EMR
Data Lake
Storage
IoT Storage
Machine Learning
& AI Storage
• AWS IoT
• Greengrass
• Other IoT sensors
• Rekognition
• LEX
• Polly
• MXNet & TensorFlow
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Maximize Throughput with Amazon S3
Amazon S3 automatically scales to thousands of requests per
second per prefix based on your steady state traffic
• Amazon S3 automatically partitions your prefixes within hours adjusting
to increases in request rates
• Consider using a three- or four-character hash (see next slide for details)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using a Three or Four Character Hash
examplebucket/232a-2017-26-05-15-00-00/cust1234234/photo1.jpg
examplebucket/7b54-2017-26-05-15-00-00/cust3857422/photo2.jpg
examplebucket/921c-2017-26-05-15-00-00/cust1248473/photo2.jpg
examplebucket/animations/232a-2017-26-05-15-00-00/cust1234234/animation1.obj
examplebucket/videos/ba65-2017-26-05-15-00-00/cust8474937/video2.mpg
examplebucket/photos/8761-2017-26-05-15-00-00/cust1248473/photo3.jpg
A bit more LIST friendly:
Random hash should come before patterns such as dates and sequential IDs
Always first ensure that your application can accommodate
Due to recent Amazon S3 performance enhancements, most customers
no longer need to worry about introducing entropy in key names
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Architecture Deep Dive
Amazon Glacier
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of Amazon S3 & Glacier
Durable, Available, & Scalable Security & Compliance
Flexible Management Ecosystem
Low-Cost
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Just Say No
X No capital investment
X No commitment
X No capacity planning
X No idle capacity
X No onerous media handling
X No complex technology refreshes
X No undifferentiated heavy lifting
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
99.999999999%
Durability
Durability for long-term preservation
Built-in Fixity Checking
Automatic recovery
Flexible Data Retrieval Options
All of your Glacier data is accessible with any of three retrieval options.
Standard Retrieval
• Current model
• 3-5 hours
• $0.01/GB
Bulk Retrieval
• Batch/Bulk access
• 5-12 hours
• $0.0025/GB
Expedited Retrieval
• Rare urgent access
• 1-5 minutes
• $0.03/GB
On-site tape replacement Off-site tape replacement
Multiple Ways to Access S3 and Glacier
1. Use S3 and Glacier via S3 Lifecycle Management
2. Direct Amazon Glacier API/SDK
3. AWS Storage Gateway
4. 3rd party tools and gateways
FastGlacier
Amazon Glacier – Direct access/APIs
Create
Vault
Configure
Access
Upload
Archives
Register
Archive ID
Data Upload
Initiate
Retrieval
Async
Retrieval
Completion
Completion
Notification
Download
Data
Data Retrieval
Third-party tools and gateways
• Consumer grade: less than $50 per license
• Example: Cloudberry, FastGlacier, Arq (Haystack Software)
• Small / medium business: $500 - $1,000 per license
• Example: Synology, Veeam, QNap
• Enterprise gateway and data management software
• Example: NetApp AltaVault, Commvault, StorNext, StoreReduce,
Vidispine
Which option should I choose?
• Use S3 lifecycle managed Amazon Glacier if the S3
object keys are sufficient for index/search capability
• Use Amazon Glacier directly if you already plan to store
more metadata/indices in a database
• Use 3rd party tools to minimize coding
Amazon Glacier Vault Lock allows you to easily
set compliance controls on individual vaults and
enforce them via a lockable policy
Time-based retention
MFA authentication
Controls govern all
records in a vault
Immutable policy
Two-step locking
Compliance storage with Vault Lock
Vault Lock for compliance storage
• Non-overwrite, non-erasable records
• Time-based retention with “ArchiveAgeInDays” control
• Policy lockdown (strong governance)
• Legal hold with vault-level tags
• Configure optional designated third-party access and grant
temporary access
How does Vault Lock work?
• Do you use WORM drives/media?
• How do you achieve WORM?
• What happens to data under retention if I close my account?
• Does AWS provide Designated 3rd party service?
Amazon Glacier received a third-party assessment
from Cohasset Associates on how Amazon Glacier
with Vault Lock can be used to meet the requirements
of SEC Rule 17a-4(f) and CFTC 1.31(b)-(c).
Example control: 1-year record retention
• Deny delete archive operation
• From anybody (root, administrators, users, business
partners)
• When ArchiveAgeInDays is <= 365 days
Archive age computed from the time an archive lands in a vault
Example control: 1-year record retention
Vault Lock in the Amazon Glacier console
Large Scale Disaster Recovery
Jamal Mazhar, Head of Infrastructure and DevOps @ Sprinklr
© 2017 Sprinklr, Inc. All rights reserved.
© 2017 Sprinklr, Inc. All rights reserved.33
MOST COMPLETE SOCIAL MEDIA MANAGEMENT PLATFORM
Reach Engage Listen+ +
advertising marketing commerce care
research +
insights
CUSTOMER EXPERIENCE MANAGEMENT PLATFORM
Integrate legacy systems Collaborate across silos Unified Platform
experience cloud
Social is about managing the disruption of connected & empowered
customers
Digital transformation is about managing new
expectations
© 2017 Sprinklr, Inc. All rights reserved.34
Sprinklr Architecture
35
Sprinklr Platform - Key Technologies
© 2017 Sprinklr, Inc. All rights reserved.
Applications
DBs
Ops &
Automation
+ custom
codeS3 EC2 CloudFront EBS
+ CloudWatch, Elastic
Transcoder, ElastiCache, IAM,
Route 53, SES, SNS, SQS, VPC,
ELB, KMS
AWS
36
What is Disaster Recovery
 Difference between High Availability and Disaster
Recovery
 S3 is already Highly Available within same region
 Different approaches to Disaster Recovery and their
Pros/Cons and challenges
 Hot/Cold aka Active/Passive
 Hot/Warm aka Active/Standby
 Hot/Hot aka Active/Active
© 2017 Sprinklr, Inc. All rights reserved.
37
Sprinklr Disaster Recovery Approach
 Disaster Recovery SLAs
 Recovery Point Objective - RPO
 Recovery Time Objective – RTO
 Use of two AWS regions
 Independent 3rd party validation of our DR process
© 2017 Sprinklr, Inc. All rights reserved.
38
Scale and Scope of Sprinklr Disaster Recovery
 Large data size
 Thousands of EBS volumes for Mongo, Solr, Cassandra
 1400+ big SSD i3 servers for 100+ Elasticsearch clusters
 Thousands of servers running close to 100 different
services
 Each service has unique configuration and code
© 2017 Sprinklr, Inc. All rights reserved.
39
Three Major Challenges
1. Copying the data and configuration information quickly
within same region
2. Transferring the data to a different region and keeping it in
sync daily
3. Automation and processes to restore the entire platform
quickly
© 2017 Sprinklr, Inc. All rights reserved.
40
Challenge 1 – Copying Data and Configuration
 Traditional backup approaches didn’t work for Mongo
and Solr
 EBS snapshots
 Backup status dashboard and process
 Limits we ran into due to scale
 Concurrent Snapshot limits
 S3 IO limits for Elasticsearch backup
© 2017 Sprinklr, Inc. All rights reserved.
41
Challenge 2 – Transferring and Syncing Data
 Hit limits in keeping petabytes of data across Virginia
and Oregon in Sync
 Concurrent incremental snapshot copy limit
 Bandwidth limits
 What worked well from day one without tweaking
 S3 cross region sync for Elasticsearch
 S3 is eventually consistent, no issues in our use case
© 2017 Sprinklr, Inc. All rights reserved.
42
Challenge 3 – Restoring the Entire Platform
 Custom code to automate the entire platform sequence and
dependencies
 Launching servers
 Creating / Mounting volumes from snapshots, Code deployment
 Creating ELBs, updating DNS, Application Configuration
 Restoring over 1 PB of data for Elasticsearch clusters from S3
 Workaround for API limits and throttling
 Workaround for capacity limits
 Built custom dashboard to provide restoration status
© 2017 Sprinklr, Inc. All rights reserved.
Restoration Status
© 2017 Sprinklr, Inc. All rights reserved.
44
Key Results and Takeaways
 Keeping more than 4 petabytes of data in sync across different geo regions
 More than 50 TB of daily incremental data transfer using S3 and EBS volumes
 Bandwidth increase and concurrent snapshot optimization reduced the daily data
sync time from 36 hours to 8 hours to help meet the 24 hours RPO
 Configuration metadata and code sync across regions is critical for DR
 One click automation of the restore process allows us to bring the entire
platform up with 1000s of servers in a different region within hours
© 2017 Sprinklr, Inc. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
STG201 – Storage State of the Union – Wed, 11:30 AM
STG304 – Deep Dive on Data Archiving with Amazon S3 & Amazon Glacier,
Wed, 1:45 PM
STG313 – Big Data Breakthroughs – Wed, 12:15 PM OR 7:00 PM
STG303 – Deep Dive on Amazon Glacier – Thurs, 1:45 PM
STG312 – Best Practices for Building a Data Lake in Amazon S3 & Amazon
Glacier – Thurs, 3:15 PM
Learn more…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
For Enterprise Storage Engineers
• Learn how to architect and
manage highly available
solutions on AWS storage
services
• Advance toward AWS
certifications
• Help your organization migrate
to the cloud faster
Online at www.aws.training
• Access 100+ new digital
training courses including
advanced training on storage
• Deep Dives on S3, EFS, and EBS
• Migrating and Tiering Storage
to AWS (Hybrid Solutions)
At re:Invent
• Visit Hands-on Labs at the
Venetian
• Attend a proctored
“Introduction to EFS” Spotlight
Lab on Thursday at 3pm at the
Venetian
• Meet Storage experts at the Ask
the Experts in Hands-on Labs
room at the Venetian
New Storage Training
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Q&A
Amazon S3 Amazon Glacier
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
THANK YOU!

Contenu connexe

Tendances

Tendances (20)

(STG401) Amazon S3 Deep Dive & Best Practices
(STG401) Amazon S3 Deep Dive & Best Practices(STG401) Amazon S3 Deep Dive & Best Practices
(STG401) Amazon S3 Deep Dive & Best Practices
 
Amazon S3 Masterclass
Amazon S3 MasterclassAmazon S3 Masterclass
Amazon S3 Masterclass
 
AWS Storage - S3 Fundamentals
AWS Storage - S3 FundamentalsAWS Storage - S3 Fundamentals
AWS Storage - S3 Fundamentals
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
AWS EC2
AWS EC2AWS EC2
AWS EC2
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
Intro to AWS: Storage Services
Intro to AWS: Storage ServicesIntro to AWS: Storage Services
Intro to AWS: Storage Services
 
AWS S3 Tutorial For Beginners | Edureka
AWS S3 Tutorial For Beginners | EdurekaAWS S3 Tutorial For Beginners | Edureka
AWS S3 Tutorial For Beginners | Edureka
 
Introduction to AWS Storage Services
Introduction to AWS Storage ServicesIntroduction to AWS Storage Services
Introduction to AWS Storage Services
 
Introduction to AWS Cloud Computing | AWS Public Sector Summit 2016
Introduction to AWS Cloud Computing | AWS Public Sector Summit 2016Introduction to AWS Cloud Computing | AWS Public Sector Summit 2016
Introduction to AWS Cloud Computing | AWS Public Sector Summit 2016
 
Backup and archiving in the aws cloud
Backup and archiving in the aws cloudBackup and archiving in the aws cloud
Backup and archiving in the aws cloud
 
AWS 101
AWS 101AWS 101
AWS 101
 
Azure storage
Azure storageAzure storage
Azure storage
 
Aws storage
Aws storageAws storage
Aws storage
 
Deep Dive on AWS Lambda
Deep Dive on AWS LambdaDeep Dive on AWS Lambda
Deep Dive on AWS Lambda
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Introduction to Amazon Elastic File System (EFS)
Introduction to Amazon Elastic File System (EFS)Introduction to Amazon Elastic File System (EFS)
Introduction to Amazon Elastic File System (EFS)
 
AWS Lambda
AWS LambdaAWS Lambda
AWS Lambda
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
 
Introduction to Amazon S3
Introduction to Amazon S3Introduction to Amazon S3
Introduction to Amazon S3
 

Similaire à STG301_Deep Dive on Amazon S3 and Glacier Architecture

Similaire à STG301_Deep Dive on Amazon S3 and Glacier Architecture (20)

Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage Overview
 
Deep Dive on Amazon Glacier - STG303 - re:Invent 2017
Deep Dive on Amazon Glacier - STG303 - re:Invent 2017Deep Dive on Amazon Glacier - STG303 - re:Invent 2017
Deep Dive on Amazon Glacier - STG303 - re:Invent 2017
 
Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...
Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...
Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...
 
How to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWSHow to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWS
 
STG302_Best Practices for Amazon S3
STG302_Best Practices for Amazon S3STG302_Best Practices for Amazon S3
STG302_Best Practices for Amazon S3
 
AWS Storage Stage of Union
AWS Storage Stage of UnionAWS Storage Stage of Union
AWS Storage Stage of Union
 
Building Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleBuilding Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scale
 
Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud Storage
 
SRV301 Latest Updates & Best Practices for Amazon S3
 SRV301 Latest Updates & Best Practices for Amazon S3 SRV301 Latest Updates & Best Practices for Amazon S3
SRV301 Latest Updates & Best Practices for Amazon S3
 
Deploy and Enforce Compliance Controls When Archiving Large-Scale Data Stores...
Deploy and Enforce Compliance Controls When Archiving Large-Scale Data Stores...Deploy and Enforce Compliance Controls When Archiving Large-Scale Data Stores...
Deploy and Enforce Compliance Controls When Archiving Large-Scale Data Stores...
 
Backup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
Backup & Recovery - Optimize Your Backup and Restore Architectures in the CloudBackup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
Backup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
 
Building Hybrid Cloud Storage Architectures with AWS
Building Hybrid Cloud Storage Architectures with AWSBuilding Hybrid Cloud Storage Architectures with AWS
Building Hybrid Cloud Storage Architectures with AWS
 
Amazon S3_Updates and Best Practices
Amazon S3_Updates and Best Practices Amazon S3_Updates and Best Practices
Amazon S3_Updates and Best Practices
 
Scalable and Secure Cloud-Based Data Archiving for Digital Libraries, Complia...
Scalable and Secure Cloud-Based Data Archiving for Digital Libraries, Complia...Scalable and Secure Cloud-Based Data Archiving for Digital Libraries, Complia...
Scalable and Secure Cloud-Based Data Archiving for Digital Libraries, Complia...
 
Storage & Content Delivery
Storage & Content DeliveryStorage & Content Delivery
Storage & Content Delivery
 
Storage & Content Delivery
Storage & Content Delivery Storage & Content Delivery
Storage & Content Delivery
 
AWS Storage and Content Delivery Services
AWS Storage and Content Delivery ServicesAWS Storage and Content Delivery Services
AWS Storage and Content Delivery Services
 
Amazon S3: Updates and Best Practices - SRV301 - Chicago AWS Summit
Amazon S3: Updates and Best Practices - SRV301 - Chicago AWS SummitAmazon S3: Updates and Best Practices - SRV301 - Chicago AWS Summit
Amazon S3: Updates and Best Practices - SRV301 - Chicago AWS Summit
 
Deep Dive on Archiving and Compliance
Deep Dive on Archiving and ComplianceDeep Dive on Archiving and Compliance
Deep Dive on Archiving and Compliance
 
AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amaz...
AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amaz...AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amaz...
AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amaz...
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

STG301_Deep Dive on Amazon S3 and Glacier Architecture

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deep Dive on Amazon S3 and Glacier Architecture C r a i g C o t t o n , D i r e c t o r P r o d u c t M a n a g e m e n t – A m a z o n S 3 H e n r y Z h a n g , S e n i o r P r o d u c t M a n a g e r – G l a c i e r J a m a l M a z h a r , H e a d o f I n f r a s t r u c t u r e a n d D e v O p s – S p r i n k l r
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AGENDA • Deep dive on Amazon S3 architecture • Deep dive on Glacier architecture • Guest Speaker: Jamal Mazhar, Head of Infrastructure and DevOps @ Sprinklr
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The AWS Storage Portfolio Data Transfer 3rd Party Connectors S3 Transfer Acceleration File Amazon EFS Object Amazon GlacierAmazon S3 Block Amazon EBS (persistent) Amazon EC2 Instance Store (ephemeral) AWS Snow Family AWS Storage Gateway AWS Direct Connect Amazon Kinesis EFS File Sync
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of Amazon S3 & Glacier Durable, Available, & Scalable Security & Compliance Query In Place Flexible Management Ecosystem
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architecture Deep Dive Amazon S3
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 By The Numbers 44 Availability Zones (16 more coming in 2018) 16 Regions (5 more coming in 2018) Trillions of objects Millions of requests per second One of first three AWS Services (2006) 99.999999999% Durability
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Architecture Internet End User PUT GET DELETE Load Balancers Metadata Storage API Servers Blob Storage
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Availability Zones S3 stores data in at least 3 Availability Zones (AZ’s) Each AZ can be up to 8 physical data centers Unavailability of a data center or an AZ does not impact overall S3 availability Low latency private network connect data centers and AZ’s Physically separate – even extremely uncommon disasters would only affect a single AZ Data is automatically distributed across a minimum of 3 AZ’s GEO separated within an AWS Region
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Storage Classes & T ransitions S3 Standard S3 Standard – Infrequent Access Amazon Glacier Active data Synchronous access Milliseconds retrieval 2.1¢-GB/mo Archive data Asynchronous access Minutes-to-hours retrieval 0.4¢-GB/mo Infrequently accessed data Synchronous access Milliseconds retrieval 1.25¢-GB/mo
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Security, Encryption & Compliance T he b roade st se t of tools in the indu stry Security • IAM and Bucket Policies • Access Control Lists • Audit logging with CloudTrail & Alerts with CloudWatch • Secure CloudFormation templates • Amazon Macie • S3 Console Permission Checks Encryption • Encryption in transit with TLS • SSE-S3 – Amazon S3 manages data & keys • SSE-C – Customer managed keys • SSE-KMS – Master keys in KMS • CSE – 100% Customer managed • Default Bucket Encryption • Encryption Status in Inventory Compliance • PCI-DSS • HIPAA/HITECH • FedRAMP • FISMA • EU Data Protection Directive
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cross-region Replication Automatically replicate data to any other AWS Region • Replicate by object, bucket, or prefix • Support for SSE-KMS encrypted objects • Ownership overwrite • Change the object owner in the destination region Region A Region B Cross-region connectivity
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cross Region Replication Examples S3 Standard S3 Standard S3 Standard S-IA S3 Standard Glacier Zero-day Lifecycle Policy to Glacier
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Do More With Your In-place Data • Athena • Redshift Spectrum • QuickSight • EMR Data Lake Storage IoT Storage Machine Learning & AI Storage • AWS IoT • Greengrass • Other IoT sensors • Rekognition • LEX • Polly • MXNet & TensorFlow
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Maximize Throughput with Amazon S3 Amazon S3 automatically scales to thousands of requests per second per prefix based on your steady state traffic • Amazon S3 automatically partitions your prefixes within hours adjusting to increases in request rates • Consider using a three- or four-character hash (see next slide for details)
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using a Three or Four Character Hash examplebucket/232a-2017-26-05-15-00-00/cust1234234/photo1.jpg examplebucket/7b54-2017-26-05-15-00-00/cust3857422/photo2.jpg examplebucket/921c-2017-26-05-15-00-00/cust1248473/photo2.jpg examplebucket/animations/232a-2017-26-05-15-00-00/cust1234234/animation1.obj examplebucket/videos/ba65-2017-26-05-15-00-00/cust8474937/video2.mpg examplebucket/photos/8761-2017-26-05-15-00-00/cust1248473/photo3.jpg A bit more LIST friendly: Random hash should come before patterns such as dates and sequential IDs Always first ensure that your application can accommodate Due to recent Amazon S3 performance enhancements, most customers no longer need to worry about introducing entropy in key names
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architecture Deep Dive Amazon Glacier
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of Amazon S3 & Glacier Durable, Available, & Scalable Security & Compliance Flexible Management Ecosystem Low-Cost
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Just Say No X No capital investment X No commitment X No capacity planning X No idle capacity X No onerous media handling X No complex technology refreshes X No undifferentiated heavy lifting
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 99.999999999% Durability Durability for long-term preservation Built-in Fixity Checking Automatic recovery
  • 20. Flexible Data Retrieval Options All of your Glacier data is accessible with any of three retrieval options. Standard Retrieval • Current model • 3-5 hours • $0.01/GB Bulk Retrieval • Batch/Bulk access • 5-12 hours • $0.0025/GB Expedited Retrieval • Rare urgent access • 1-5 minutes • $0.03/GB On-site tape replacement Off-site tape replacement
  • 21. Multiple Ways to Access S3 and Glacier 1. Use S3 and Glacier via S3 Lifecycle Management 2. Direct Amazon Glacier API/SDK 3. AWS Storage Gateway 4. 3rd party tools and gateways FastGlacier
  • 22. Amazon Glacier – Direct access/APIs Create Vault Configure Access Upload Archives Register Archive ID Data Upload Initiate Retrieval Async Retrieval Completion Completion Notification Download Data Data Retrieval
  • 23. Third-party tools and gateways • Consumer grade: less than $50 per license • Example: Cloudberry, FastGlacier, Arq (Haystack Software) • Small / medium business: $500 - $1,000 per license • Example: Synology, Veeam, QNap • Enterprise gateway and data management software • Example: NetApp AltaVault, Commvault, StorNext, StoreReduce, Vidispine
  • 24. Which option should I choose? • Use S3 lifecycle managed Amazon Glacier if the S3 object keys are sufficient for index/search capability • Use Amazon Glacier directly if you already plan to store more metadata/indices in a database • Use 3rd party tools to minimize coding
  • 25. Amazon Glacier Vault Lock allows you to easily set compliance controls on individual vaults and enforce them via a lockable policy Time-based retention MFA authentication Controls govern all records in a vault Immutable policy Two-step locking Compliance storage with Vault Lock
  • 26. Vault Lock for compliance storage • Non-overwrite, non-erasable records • Time-based retention with “ArchiveAgeInDays” control • Policy lockdown (strong governance) • Legal hold with vault-level tags • Configure optional designated third-party access and grant temporary access
  • 27. How does Vault Lock work? • Do you use WORM drives/media? • How do you achieve WORM? • What happens to data under retention if I close my account? • Does AWS provide Designated 3rd party service?
  • 28. Amazon Glacier received a third-party assessment from Cohasset Associates on how Amazon Glacier with Vault Lock can be used to meet the requirements of SEC Rule 17a-4(f) and CFTC 1.31(b)-(c).
  • 29. Example control: 1-year record retention • Deny delete archive operation • From anybody (root, administrators, users, business partners) • When ArchiveAgeInDays is <= 365 days Archive age computed from the time an archive lands in a vault
  • 30. Example control: 1-year record retention
  • 31. Vault Lock in the Amazon Glacier console
  • 32. Large Scale Disaster Recovery Jamal Mazhar, Head of Infrastructure and DevOps @ Sprinklr © 2017 Sprinklr, Inc. All rights reserved.
  • 33. © 2017 Sprinklr, Inc. All rights reserved.33 MOST COMPLETE SOCIAL MEDIA MANAGEMENT PLATFORM Reach Engage Listen+ + advertising marketing commerce care research + insights CUSTOMER EXPERIENCE MANAGEMENT PLATFORM Integrate legacy systems Collaborate across silos Unified Platform experience cloud Social is about managing the disruption of connected & empowered customers Digital transformation is about managing new expectations
  • 34. © 2017 Sprinklr, Inc. All rights reserved.34 Sprinklr Architecture
  • 35. 35 Sprinklr Platform - Key Technologies © 2017 Sprinklr, Inc. All rights reserved. Applications DBs Ops & Automation + custom codeS3 EC2 CloudFront EBS + CloudWatch, Elastic Transcoder, ElastiCache, IAM, Route 53, SES, SNS, SQS, VPC, ELB, KMS AWS
  • 36. 36 What is Disaster Recovery  Difference between High Availability and Disaster Recovery  S3 is already Highly Available within same region  Different approaches to Disaster Recovery and their Pros/Cons and challenges  Hot/Cold aka Active/Passive  Hot/Warm aka Active/Standby  Hot/Hot aka Active/Active © 2017 Sprinklr, Inc. All rights reserved.
  • 37. 37 Sprinklr Disaster Recovery Approach  Disaster Recovery SLAs  Recovery Point Objective - RPO  Recovery Time Objective – RTO  Use of two AWS regions  Independent 3rd party validation of our DR process © 2017 Sprinklr, Inc. All rights reserved.
  • 38. 38 Scale and Scope of Sprinklr Disaster Recovery  Large data size  Thousands of EBS volumes for Mongo, Solr, Cassandra  1400+ big SSD i3 servers for 100+ Elasticsearch clusters  Thousands of servers running close to 100 different services  Each service has unique configuration and code © 2017 Sprinklr, Inc. All rights reserved.
  • 39. 39 Three Major Challenges 1. Copying the data and configuration information quickly within same region 2. Transferring the data to a different region and keeping it in sync daily 3. Automation and processes to restore the entire platform quickly © 2017 Sprinklr, Inc. All rights reserved.
  • 40. 40 Challenge 1 – Copying Data and Configuration  Traditional backup approaches didn’t work for Mongo and Solr  EBS snapshots  Backup status dashboard and process  Limits we ran into due to scale  Concurrent Snapshot limits  S3 IO limits for Elasticsearch backup © 2017 Sprinklr, Inc. All rights reserved.
  • 41. 41 Challenge 2 – Transferring and Syncing Data  Hit limits in keeping petabytes of data across Virginia and Oregon in Sync  Concurrent incremental snapshot copy limit  Bandwidth limits  What worked well from day one without tweaking  S3 cross region sync for Elasticsearch  S3 is eventually consistent, no issues in our use case © 2017 Sprinklr, Inc. All rights reserved.
  • 42. 42 Challenge 3 – Restoring the Entire Platform  Custom code to automate the entire platform sequence and dependencies  Launching servers  Creating / Mounting volumes from snapshots, Code deployment  Creating ELBs, updating DNS, Application Configuration  Restoring over 1 PB of data for Elasticsearch clusters from S3  Workaround for API limits and throttling  Workaround for capacity limits  Built custom dashboard to provide restoration status © 2017 Sprinklr, Inc. All rights reserved.
  • 43. Restoration Status © 2017 Sprinklr, Inc. All rights reserved.
  • 44. 44 Key Results and Takeaways  Keeping more than 4 petabytes of data in sync across different geo regions  More than 50 TB of daily incremental data transfer using S3 and EBS volumes  Bandwidth increase and concurrent snapshot optimization reduced the daily data sync time from 36 hours to 8 hours to help meet the 24 hours RPO  Configuration metadata and code sync across regions is critical for DR  One click automation of the restore process allows us to bring the entire platform up with 1000s of servers in a different region within hours © 2017 Sprinklr, Inc. All rights reserved.
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. STG201 – Storage State of the Union – Wed, 11:30 AM STG304 – Deep Dive on Data Archiving with Amazon S3 & Amazon Glacier, Wed, 1:45 PM STG313 – Big Data Breakthroughs – Wed, 12:15 PM OR 7:00 PM STG303 – Deep Dive on Amazon Glacier – Thurs, 1:45 PM STG312 – Best Practices for Building a Data Lake in Amazon S3 & Amazon Glacier – Thurs, 3:15 PM Learn more…
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. For Enterprise Storage Engineers • Learn how to architect and manage highly available solutions on AWS storage services • Advance toward AWS certifications • Help your organization migrate to the cloud faster Online at www.aws.training • Access 100+ new digital training courses including advanced training on storage • Deep Dives on S3, EFS, and EBS • Migrating and Tiering Storage to AWS (Hybrid Solutions) At re:Invent • Visit Hands-on Labs at the Venetian • Attend a proctored “Introduction to EFS” Spotlight Lab on Thursday at 3pm at the Venetian • Meet Storage experts at the Ask the Experts in Hands-on Labs room at the Venetian New Storage Training
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Q&A Amazon S3 Amazon Glacier
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. THANK YOU!