Come hear about the services that AWS provides to manage data and when to use which tools to manage data appropriately. You will learn about both data movement and coordination, as well as data storage and analysis, including when to use relational and NoSQL approaches, Hadoop, and data warehousing. This session will highlight how AWS data services have helped real-world customers.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
1. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS as a Data Platform
Chris Keyser
ckeyser@amazon.com
2. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Ease of useLower costs
Why AWS?
3. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
no capital investment
pay as you go
no subscriptions
only pay for what you use
Ease of useLower costs
4. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
programmable
zero admin easy to
configure
integrate with
existing tools
Ease of useLower costs
5. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
One tool to rule them all
6. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
II
Use the right tools
7. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Movement and Coordination
Data PipelineDirect Connect Storage GatewayImport / Export
8. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Storage and Analysis Services
EC2EBS
Instance Storage
RedshiftRDS
SQL Stores
EMR
Hadoop
DynamoDB
NOSQL
Kinesis
Stream
Cloud
Search
Search
S3
Storage Services
Cloud
FrontGlacier
9. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Movement and Coordination
10. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Movement and Coordination - Plumbing
Ship us your disks
Direct
Connect
Storage
Gateway
Import /
Export
Dedicated network pipes
Storage backup & archiving
11. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Data
Pipeline
Resource management
Scheduling, execution, and retry
Dependency tracking
Failure notification
Movement and Coordination - Orchestration
12. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Data Storage and Analysis
13. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Storage Services – Object Store
Amazon S3
> 1.5 million peak requests/sec
Designed for 99.999999999% durability
Trillions of objects
Stores anything
Lifecycle and Versioning
14. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Storage Services - Archive Storage
Low cost, durable archiving
“Cold Storage”
Infrequently accessed data
Integrated S3 lifecycle policies
Amazon
Glacier
15. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Storage Services – Edge Caching
Simple to use with global footprint
Streaming support
Large file distribution
Private content
S3, EC2 and ELB integration
Geo restrictions
Amazon
CloudFront
17. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Instance Storage - Options
Ephemeral Storage (“local”)
You manage backup/restoral
High Storage instances available
i2.8xlarge – 6.4 TB SSD (350K IOPS)
hs1.8xlarge – 48 TB Disk Storage
Amazon
EC2
Elastic Block Storage
“Network Attached Storage”
Snapshot, Encryption
Provisioned throughput (IOPS)
18. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Instance Storage - Build Your Own
Amazon
EC2
NFS
MongoDB
Cassandra
GraphLab
Titan
Kafka
Luster
Gluster
Flume
Scribe
Presto
…and more
19. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
MySQL, Oracle, SQLServer, Postgres
Backup/Restore, High Availability
Push Button Scalability
Up to 3 TB and 30K IOPS
Amazon
RDS
SQL Stores - Managed Relational DB
20. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Relational data warehouse
Massively parallel
Petabyte scale
Fully managed
$1,000/TB/Year
Amazon
Redshift
SQL Stores- Petabyte Data Warehouse
21. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
SQL Stores- Amazon Redshift Architecture
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Backup and restore via S3
– Parallel load from S3, EMR, or DynamoDB
• HW optimized for data processing
– DW1: 2TB – 1.6PB Magnetic
– DW2: 160GB – 256TB SSD
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
22. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
NoSQL Database
Seamless scalability
Zero admin
Single digit millisecond latency
Amazon
DynamoDB
NoSQL – Dial Up Capacity
23. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
WRITES
Continuously replicated to 3 AZ’s
Quorum acknowledgment
Persisted to disk (custom SSD)
READS
Strongly or eventually consistent
No trade-off in latency
NoSQL - Durable Low Latency at Scale
24. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Hive, Impala, Spark, Pig, MapReduce
Easy to use; fully managed
On-demand and spot pricing
Persistent and transient clusters
Deep integration with S3
Amazon
Elastic Map
Reduce
Hadoop – On Demand
25. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Master instance group
Task instance groupCore instance group
HDFS HDFS
Amazon S3Amazon
Redshift
Amazon
DynamoDB
Hadoop – Tuned for AWS
27. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Real-time data collection
Seamlessly scale to gigabytes/s
Low cost managed service
EMR integration
Low cost managed service
Streaming - at Scale
Amazon
Kinesis
28. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Streaming - Amazon Kinesis Architecture
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates data
across three data centers (availability zones)
Millions of
sources producing
100s of terabytes
per hour
Front
End
Authentication
Authorization
Ordered stream
of events supports
multiple readers
Inexpensive: $0.028 per million puts
Aggregate analysis
in Hadoop or data
Warehouse
Machine learning
algorithms or sliding
window analytics
Real-time
dashboards
and alarms
Aggregate and
Archive to S3
29. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Fully managed search engine
Simple to operate
Highly available
User configurable scaling
Advanced feature support
Search – Made Simple
Amazon
CloudSearch
34 languages
Algorithmic stemming
Geospatial search
Faceted search
Suggestions
Highlighting
Field weighting
…
30. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
The right tool. At the right time. At the right scale.
31. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Thank You
Chris Keyser
ckeyser@amazon.com
Editor's Notes
OBAMA for America -> In the system, Ruby on Rails (RoR), Python/Django, PHP, and a host of other front- and mid-tier technologies intermingled, creating a robust heterogeneous design. Below that, the use of 10 different structured storage systems reflected a focus on bringing tools suited to the data itself. Intermingling technologies included relational database management services like Amazon Relational Database Service (Amazon RDS) for MySQL, PostgreSQL, and Microsoft SQL Server; NoSQL software like MongoDB, Apache Hadoop, Vertica, and LevelDB; and Amazon S3, Amazon DynamoDB, and Amazon SimpleDB.
OBAMA for America -> In the system, Ruby on Rails (RoR), Python/Django, PHP, and a host of other front- and mid-tier technologies intermingled, creating a robust heterogeneous design. Below that, the use of 10 different structured storage systems reflected a focus on bringing tools suited to the data itself. Intermingling technologies included relational database management services like Amazon Relational Database Service (Amazon RDS) for MySQL, PostgreSQL, and Microsoft SQL Server; NoSQL software like MongoDB, Apache Hadoop, Vertica, and LevelDB; and Amazon S3, Amazon DynamoDB, and Amazon SimpleDB.
The latency characteristics of DynamoDB are under 10 msec and highly consistent.
Most importantly, the data is durable in DynamoDB, constantly replicated across multiple data centers and persisted to SSD storage.
More context – Mongo DB, Cassandra
Variety – can process many different types, custom serdes, etc.
Velocity – certain pacakages that run on Hadoop help with real time data injestion, like flume, storm, kafka, spark streaming
Volume – designed to work on massive data sets.
Start an EMR cluster using console or cli tools
Master instance group created that controls the cluster
Core instance group created for life of cluster
Core instances run DataNode and TaskTracker daemons
Optional task instances can be added or subtracted to perform work (SPOT)
S3 can be used as underlying ‘file system’ for input/output data
Master node coordinates distribution of work and manages cluster state
Core and Task instances read-write to S3
Volume – pretty high
Velocity – very high
Variety – good if it fits into 40k, otherwise need to do some lifting.
Volume – pretty high
Velocity – very high
Variety – good if it fits into 40k, otherwise need to do some lifting.