This document discusses building a modern data analytics architecture on AWS. It provides an overview of AWS services that can be used for ingesting, processing, storing, and analyzing large volumes of data in both real-time and batch scenarios. These include services like Amazon S3, Kinesis, EMR, Redshift, Athena, Elasticsearch, and Glue for ingesting, storing, processing, and querying data. Architectures shown include real-time data pipelines, data lakes, and batch ETL/ELT processes. Performance, cost effectiveness, and scalability benefits of AWS services are highlighted.
8. ID Name
1 John Smith
2 Jane Jones
3 Peter Black
4 Pat Partridge
5 Sarah Cyan
6 Brian Snail
1 John Smith
4 Pat Partridge
2 Jane Jones
5 Sarah Cyan
3 Peter Black
6 Brian Snail
9. 1 John Smith
4 Pat Partridge
2 Jane Jones
5 Sarah Cyan
3 Peter Black
6 Brian Snail
SQL
SQL SQL SQLResults Results Results
Results
28. Compute Flexibility
Compute Memory Storage
Machine Learning
C4 Family
C3 Family
X1 Family
R3 Family
Interactive Analysis
D2 Family
I2 Family
Large HDFS
General
Batch Process
M4 Family
M3 Family
29. Cost & Time
# CPUs
Time
# CPUs
Time
Wall clock time: 1 hourWall clock time: 10 hours
40. Comparison of SQL Processing engines
Amazon
Redshift
Amazon
Athena
Data Structure
Languages
Semi Semi
SQL, HiveQL SQL
Full
SQL
Data Store S3/HDFS S3 Local
SQL
Semi
SQL
S3/HDFS
Performance
41. Comparison of SQL Processing engines
Transformation
SQL Queries
For S3/HDFS
Fully Featured
SQL
Database
Use Case
Amazon
RedshiftAmazon
Athena
SQL
Serverless
SQL Queries
for S3
56. New X1 Instance - Tons of Memory
• Large-scale, in-memory applications
• Intel® Xeon® E7 8880 v3 Haswell processors
• Up to 2TB of memory
• Up to 128 vCPUs per instance
57. Intel® Processor Technologies
Intel® AVX – Dramatically increases performance for highly parallel HPC workloads
such as life science engineering, data mining, financial analysis, media processing
Intel® AES-NI – Enhances security with new encryption instructions that reduce the
performance penalty associated with encrypting/decrypting data
Intel® Turbo Boost Technology – Increases computing power with performance that
adapts to spikes in workloads
Intel Transactional Synchronization (TSX) Extensions – Enables execution of
transactions that are independent to accelerate throughput
P state & C state control – provides granular performance tuning for cores and sleep
states to improve overall application performance