The AWS Workshop Series Online is a series of live webinars designed for IT professionals who are looking to leverage the AWS Cloud to build and transform their business, are new to the AWS Cloud or looking to further expand their skills and expertise. In this series, we will cover :'Modern Data Architectures for Business Insights at Scale'.
10. ID Name
1 John Smith
2 Jane Jones
3 Peter Black
4 Pat Partridge
5 Sarah Cyan
6 Brian Snail
1 John Smith
4 Pat Partridge
2 Jane Jones
5 Sarah Cyan
3 Peter Black
6 Brian Snail
11. 1 John Smith
4 Pat Partridge
2 Jane Jones
5 Sarah Cyan
3 Peter Black
6 Brian Snail
SQL
SQL SQL SQLResults Results Results
Results
30. Compute Flexibility
Compute Memory Storage
Machine Learning
C4 Family
C3 Family
X1 Family
R3 Family
Interactive Analysis
D2 Family
I2 Family
Large HDFS
General
Batch Process
M4 Family
M3 Family
31. Cost & Time
# CPUs
Time
# CPUs
Time
Wall clock time: 1 hourWall clock time: 10 hours
42. Comparison of SQL Processing engines
Amazon
Redshift
Amazon
Athena
Data Structure
Languages
Semi Semi
SQL, HiveQL SQL
Full
SQL
Data Store S3/HDFS S3 Local
SQL
Semi
SQL
S3/HDFS
Performance
43. Comparison of SQL Processing engines
Transformation
SQL Queries
For S3/HDFS
Fully Featured
SQL
Database
Use Case
Amazon
RedshiftAmazon
Athena
SQL
Serverless
SQL Queries
for S3
55. Ingest Serving
Speed (Real-time)
Scale (Batch)
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
Sources
Amazon
Kinesis
AWS Lambda
Application
Amazon EMR
Streaming
Amazon
EMR
Data Lake
Amazon
Redshift
ETL
Amazon
Athena
EC2
AWS
CLI & SDK
Amazon
S3
Amazon
EMR
Amazon
S3
AWS
Cloud Trail
AWS
IAM
Amazon
CloudWatch
AWS
KMS
56. New X1 Instance - Tons of Memory
• Large-scale, in-memory applications
• Intel® Xeon® E7 8880 v3 Haswell processors
• Up to 2TB of memory
• Up to 128 vCPUs per instance
57. Intel® Processor Technologies
Intel® AVX – Dramatically increases performance for highly parallel HPC workloads
such as life science engineering, data mining, financial analysis, media processing
Intel® AES-NI – Enhances security with new encryption instructions that reduce the
performance penalty associated with encrypting/decrypting data
Intel® Turbo Boost Technology – Increases computing power with performance that
adapts to spikes in workloads
Intel Transactional Synchronization (TSX) Extensions – Enables execution of
transactions that are independent to accelerate throughput
P state & C state control – provides granular performance tuning for cores and sleep
states to improve overall application performance
60. Register now for upcoming session for today,
if you would like to join.
https://pages.awscloud.com/aws-workshop-online.html
4pm – 5pm (SGT)
Modern Data Architectures
for Real-time Analytics and Engagement
Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances.
The EMR File System allows EMR clusters to efficiently and securely use Amazon S3 as an object store for Hadoop. You can store your data in Amazon S3 and use multiple Amazon EMR clusters to process the same data set. Each cluster can be optimized for a particular workload, which can be more efficient than a single cluster serving multiple workloads with different requirements. For example, you might have one cluster that is optimized for I/O and another that is optimized for CPU, each processing the same data set in Amazon S3. Additionally, by storing your input and output data in Amazon S3, you can shut down clusters when they are no longer needed.
Amazon EMR makes it easy to use spot instances so you can save both time and money. Amazon EMR clusters include 'core nodes' that run HDFS and ‘task nodes’ that do not; task nodes are ideal for Spot because if the Spot price increases and you lose those instances you will not lose data stored in HDFS.
Amazon EMR supports powerful and proven Hadoop tools such as Hive, Pig, HBase, and Impala. Additionally, it can run distributed computing frameworks besides Hadoop MapReduce such as Spark or Presto using bootstrap actions. You can also use Hue and Zeppelin as GUIs for interacting with applications on your cluster.
Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances.
The EMR File System allows EMR clusters to efficiently and securely use Amazon S3 as an object store for Hadoop. You can store your data in Amazon S3 and use multiple Amazon EMR clusters to process the same data set. Each cluster can be optimized for a particular workload, which can be more efficient than a single cluster serving multiple workloads with different requirements. For example, you might have one cluster that is optimized for I/O and another that is optimized for CPU, each processing the same data set in Amazon S3. Additionally, by storing your input and output data in Amazon S3, you can shut down clusters when they are no longer needed.
Amazon EMR makes it easy to use spot instances so you can save both time and money. Amazon EMR clusters include 'core nodes' that run HDFS and ‘task nodes’ that do not; task nodes are ideal for Spot because if the Spot price increases and you lose those instances you will not lose data stored in HDFS.
Amazon EMR supports powerful and proven Hadoop tools such as Hive, Pig, HBase, and Impala. Additionally, it can run distributed computing frameworks besides Hadoop MapReduce such as Spark or Presto using bootstrap actions. You can also use Hue and Zeppelin as GUIs for interacting with applications on your cluster.
Athena is a fully managed serverless service.
There is no provisioning or administration to be performed and the service is available instantly.
Pricing is per query
We’ve looked at a few different processing engines for your data lake.
Here we compare the options so you can choose the best one for your use case.
Amazon Elasticsearch is a managed service for Elasticsearch.
Use the AWS Management Console or simple API calls to access a production-ready Amazon Elasticsearch cluster in minutes without worrying about infrastructure provisioning, or installing and maintaining Elasticsearch software.
Amazon Elasticsearch Service simplifies time-consuming management tasks --such as ensuring high availability, patch management, failure detection and node replacement, backups, and monitoring-
More : https://aws.amazon.com/blogs/aws/ec2-instance-update-x1-sap-hana-t2-nano-websites/