SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Julien Simon, Principal Technical Evangelist, AWS
julsimon@amazon.fr "
@julsimon
Picking the right AWS backend "
for your Java application
Who am I?
Hacker. Headbanger. Harley Rider. Hunter.

Love/hate relationship with all versions of Java, from JDK 1.0 and
PersonalJava to Java 8 :D

20+ years in R&D teams, from smartcards to web platforms

VP Eng/CTO @ Digiplug, Pixmania, Criteo, Aldebaran Robotics, Viadeo

Now walking the Earth for Amazon Web Services
What to expect
•  Writing Java apps on AWS
•  Databases
•  Amazon RDS
•  Amazon DynamoDB
•  Analytics
•  Hive on Amazon EMR
•  Amazon Athena
•  Amazon Redshift
•  Conclusion
Code available at https://github.com/juliensimon/aws/tree/master/javabackends
Writing Java apps on AWS
Four deployment options




https://docs.aws.amazon.com/sdk-for-java/
https://github.com/aws/aws-sdk-java
Java SDK for the AWS API (Java 1.6+)
AWS Lambda
Java 8
Open Source frameworks:"
Serverless, Gordon, Apex, ….
Amazon EC2 Container Service
AWS ElasticBeanstalk

Java 6/7/8 with Tomcat 7/8
Java 7/8 with Glassfish 4
Amazon EC2
AWS plugin for Eclipse
https://aws.amazon.com/documentation/awstoolkiteclipse/
3rd party plugins for Intellij IDEA
•  AWS Elastic Beanstalk Integration
https://plugins.jetbrains.com/plugin/7274-aws-elastic-beanstalk-integration

•  AWS CloudFormation
https://plugins.jetbrains.com/plugin/7371-aws-cloudformation 
•  AWS Manager – almost 2 years old :-/
https://plugins.jetbrains.com/plugin/4558-aws-manager
A quick reminder on AWS Credentials
•  AWS Identity and Access Management (IAM)
•  Users are authenticated with an Access Key and a Secret Key
•  AWS APIs and AWS resources require permissions
•  Permissions are defined in policies
•  Attached directly to Users or Groups
•  Attached to Roles, in turn attached to Services (EC2, Lambda, etc.)
Managing credentials
•  Please do not hardcode them in your application
•  Please do not store them on EC2 instances
•  It WILL end in tears!
•  AWS credentials: use IAM Roles
•  Backend credentials: use the EC2 SM Parameter Store
•  Automatic encryption with Amazon KMS
https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/index.html?com/amazonaws/auth/AWSCredentialsProvider.html 
https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-paramstore.html
Reference "
architecture
Amazon
Athena
Databases
No infrastructure
management
Scale up/down
Cost-effective
Instant provisioning
Application
compatibility
Amazon Relational Database Service
99.95% SLA
https://aws.amazon.com/rds/ 
https://aws.amazon.com/rds/sla/
Amazon RDS – 6 Database Engines
•  Amazon Aurora
•  MySQL 5.5.46 à 5.7.16
•  MariaDB 10.0.17 à 10.1.19
•  PostgreSQL 9.3.12-R1 à 9.6.2-R1
•  Oracle 11.2.0.4.v1 à 12.1.0.2.v7
•  SQL Server 2008 à 2016
Aurora design principles https://dl.acm.org/citation.cfm?id=3056101
Selected Amazon RDS customers
Amazon Aurora demo
Java SDK: https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/rds/AmazonRDSClient.html
JDBC drivers: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/java-rds.html
Amazon DynamoDB
Document or Key-Value
 Scales to Any Workload
Fully Managed NoSQL
Access Control
 Event Driven Programming
Fast and Consistent
https://aws.amazon.com/dynamodb/
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html
Expedia is a leader in the $1 trillion travel industry, with an
extensive portfolio that includes some of the world’s most
trusted travel brands.
With DynamoDB we were up
and running in a less than day,
and there is no need for a
team to maintain.
Kuldeep Chowhan
Principal Engineer, Expedia
”
“
 •  Expedia’s real-time analytics
application collects data for its “test
& learn” experiments on Expedia
sites.
•  The analytics application processes
~200 million messages daily.
•  Ease of setup, monitoring, and
scaling were key factors in choosing
DynamoDB.
Case Study – Expedia
https://aws.amazon.com/fr/blogs/big-data/how-expedia-implemented-near-real-time-analysis-of-interdependent-datasets/
Amazon DynamoDB demo
Java SDK: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/CodeSamples.Java.html
https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/examples-dynamodb.html
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html
Amazon ElastiCache
 Amazon"
DynamoDB
Amazon"
RDS/Aurora
Amazon
ElasticSearch
Amazon S3
 Amazon Glacier
Average
latency
ms
 ms
 ms, sec
 ms,sec
 ms,sec,min
(~ size)
hrs
Typical
data stored
GB
 GB–TBs
(no limit)
GB–TB
(64 TB max)
GB–TB
 MB–PB
(no limit)
GB–PB
(no limit)
Typical
item size
B-KB
 KB
(400 KB max)
KB
(64 KB max)
B-KB
(2 GB max)
KB-TB
(5 TB max)
GB"
(40 TB max)
Request 
Rate
High – very high
 Very high
(no limit)
High
 High
 Low – high
(no limit)
Very low

Storage cost
GB/month
$$
 ¢¢
 ¢¢
 ¢¢

¢
 ¢4/10
Durability
 Low - moderate
 Very high
 Very high
 High
 Very high
 Very high
Availability
 High
2 AZ
Very high 
3 AZ
Very high
3 AZ
High
2 AZ
Very high
3 AZ
Very high
3 AZ
Hot data
 Warm data
 Cold data
Which Data Store Should I Use?
Analytics
Amazon Elastic Map Reduce (EMR)
•  Apache Hadoop, Spark, Hive,etc.
•  Managed service
•  Easy to start, resize & terminate clusters
•  Cost-efficient, especially with Spot Instances
•  Integration with backends


https://aws.amazon.com/emr/
Case study – FINRA
FINRA, the primary regulatory agency for stock brokers in the
US, uses AWS extensively in their IT operations and has
migrated key portions of its technology stack to AWS
including Market Surveillance and Member Regulation. 

For market surveillance, each night FINRA loads
approximately 35 billion rows of data into Amazon S3 and
Amazon EMR (up to 10,000 nodes) to monitor trading activity
on exchanges and market centers in the US.

https://aws.amazon.com/solutions/case-studies/finra/
Hive demo
Java SDK: https://docs.aws.amazon.com/emr/latest/ManagementGuide/calling-emr-with-java-sdk.html
JDBC: https://docs.aws.amazon.com//emr/latest/ReleaseGuide/HiveJDBCDriver.html
Amazon Athena
•  Run read-only SQL queries on S3 data
•  No data loading, no indexing, no nothing
•  No infrastructure to create, manage or scale
•  Service based on Presto
•  Table creation: Apache Hive Data Definition Language
•  ANSI SQL operators and functions: what Presto
supports, with a few exceptions

AWS re:Invent 2016: Introducing Athena (BDA303) https://www.youtube.com/watch?v=DxAuj_Ky5aw
https://prestodb.io
Data formats supported by Athena
•  Unstructured
•  Apache logs, with customizable regular expression
•  Semi-structured
•  delimiter-separated values (CSV, OpenCSV)
•  Tab-separated values (TSV)
•  JSON
•  Structured
•  Apache Parquet https://parquet.apache.org/
•  Apache ORC https://orc.apache.org/ 
•  Apache Avro https://avro.apache.org/ 
•  Compression (LZO, Snappy, Zlib, GZIP) & partitioning
Case Study – DataXu
CDN
Real Time"
Bidding
Retargeting"
Platform
Kinesis
 Attribution & ML 
S3
Reporting
Data Visualization
Data 
Pipeline
ETL (Spark SQL)
Amazon Athena
180TB of Log Data per Day
https://aws.amazon.com/fr/solutions/case-studies/dataxu/
Amazon Athena demo
Java SDK: https://docs.aws.amazon.com/athena/latest/ug/code-samples.html 
JDBC: https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html
•  Relational data warehouse
•  SQL is all you need to know 
•  Fully managed
•  Massively parallel
•  Petabyte scale
•  As low as $1000/TB/year
•  Athena-like capabilities with Redshift Spectrum
Amazon Redshift
https://aws.amazon.com/redshift 
http://www.allthingsdistributed.com/2012/11/amazon-redshift.html 
Intro to Amazon Redshift Spectrum https://www.youtube.com/watch?v=gchd2sDhSuY
Amazon Redshift Architecture
Ingestion
Backup
Restore
S3 / EMR / DynamoDB
Leader Node
10 GigE
Spectrum
Query external"
tables stored"
in S3
What customers says about Amazon Redshift
“Redshift is twenty times faster than Hive” (5x – 20x reduction in query times) link

“Queries that used to take hours came back in seconds. Our analysts are orders of
magnitude more productive.” (20x – 40x reduction in query times) link

…[Redshift] performance has blown away everyone here (we generally see 50-100x
speedup over Hive). link
“Team played with Redshift today and concluded it is ****** awesome. "
Un-indexed complex queries returning in < 10s.”
“Did I mention it's ridiculously fast? We'll be using it immediately to provide our analysts
an alternative to Hadoop.”
Channel
 We regularly process multibillion row datasets and we do that in a matter of hours. link
Amazon Redshift demo
Java SDK: https://docs.aws.amazon.com/redshift/latest/mgmt/using-aws-sdk-for-java.html 
JDBC driver: https://docs.aws.amazon.com/redshift/latest/mgmt/configure-jdbc-connection.html
Amazon Redshift
 Amazon Athena
 Amazon EMR
Presto
 Spark
 Hive
Use case
 Optimized for data
warehousing
Ad-hoc Interactive
Queries
Interactive
Query
General purpose
(iterative ML, RT, ..)
Batch
Scale/throughput
 ~Nodes
Automatic (Spectrum)
Automatic / No limits
 ~ Nodes
AWS Managed
Service
Yes
 Yes, Serverless
 Yes
Storage
 Local storage"
Amazon S3 (Spectrum)
Amazon S3
 Amazon S3, HDFS
Optimization
 Columnar storage, data
compression, and zone
maps 
CSV, TSV, JSON,
Parquet, ORC, Apache
Web log
Framework dependent
Metadata
 Amazon Redshift managed
 Athena Catalog
Manager
Hive Meta-store
BI tools supports
 Yes (JDBC/ODBC)
 Yes (JDBC)
 Yes (JDBC/ODBC & Custom)
Access controls
 Users, groups, and access
controls
AWS IAM
 Integration with LDAP
UDF support
 Yes (Scalar)
 No
 Yes
Slow
Conclusion
Reference "
architecture
Amazon
Athena
•  Converts your tables, views, stored procedures, "
and data manipulation language to RDS or"
Amazon Redshift
•  Highlights where manual edits are needed
AWS Schema Conversion Tool
https://aws.amazon.com/dms/
ü  Move data to the same or different database engine
ü  Move data to Redshift, DynamoDB or S3
ü  Keep your apps running during the migration
ü  Start your first migration in 10 minutes or less
ü  Replicate within, to, or from Amazon EC2 or RDS
AWS Database Migration Service
https://aws.amazon.com/dms/
http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Sources.html 
http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Targets.html 
https://aws.amazon.com/blogs/database/database-migration-what-do-you-need-to-know-before-you-start/
AWS is a rich and lively environment for Java platforms

Your choice of backends: relational, NoSQL, Big Data, analytics

The tools you need, with less or no infrastructure drama

Built-in high availability, scalability, security & compliance

Focus on creativity and productivity, not on plumbing
Thank you! 
Julien Simon, Principal Technical Evangelist, AWS
julsimon@amazon.fr "
@julsimon

Contenu connexe

Tendances

T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsT1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on aws
Amazon Web Services
 
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
Amazon Web Services
 

Tendances (20)

T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsT1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on aws
 
Introduction to AWS (October 2017)
Introduction to AWS (October 2017)Introduction to AWS (October 2017)
Introduction to AWS (October 2017)
 
Deep Learning with AWS (November 2016)
Deep Learning with AWS (November 2016)Deep Learning with AWS (November 2016)
Deep Learning with AWS (November 2016)
 
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
 
Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)
 
Introduction to AWS X-Ray
Introduction to AWS X-RayIntroduction to AWS X-Ray
Introduction to AWS X-Ray
 
Amazon EC2 & VPC HOL
Amazon EC2 & VPC HOLAmazon EC2 & VPC HOL
Amazon EC2 & VPC HOL
 
Get the Most Bang for your Buck with #EC2 #Winning
Get the Most Bang for your Buck with #EC2 #WinningGet the Most Bang for your Buck with #EC2 #Winning
Get the Most Bang for your Buck with #EC2 #Winning
 
More nines for your dimes: Improving availability and lowering costs using au...
More nines for your dimes: Improving availability and lowering costs using au...More nines for your dimes: Improving availability and lowering costs using au...
More nines for your dimes: Improving availability and lowering costs using au...
 
AWS Partner Presentation - SAP
AWS Partner Presentation - SAP AWS Partner Presentation - SAP
AWS Partner Presentation - SAP
 
Docker clusters on AWS with Amazon ECS and Kubernetes
Docker clusters on AWS with Amazon ECS and KubernetesDocker clusters on AWS with Amazon ECS and Kubernetes
Docker clusters on AWS with Amazon ECS and Kubernetes
 
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 SpotIntroduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You Thousands
 
Machine Learning in Action
Machine Learning in ActionMachine Learning in Action
Machine Learning in Action
 
Get the Most Bang for Your Buck with #EC2 #WINNING
Get the Most Bang for Your Buck with #EC2 #WINNINGGet the Most Bang for Your Buck with #EC2 #WINNING
Get the Most Bang for Your Buck with #EC2 #WINNING
 
Amazon EC2 Masterclass
Amazon EC2 MasterclassAmazon EC2 Masterclass
Amazon EC2 Masterclass
 
AWS Storage Tiers for Enterprise Workloads - Best Practices (STG301) | AWS re...
AWS Storage Tiers for Enterprise Workloads - Best Practices (STG301) | AWS re...AWS Storage Tiers for Enterprise Workloads - Best Practices (STG301) | AWS re...
AWS Storage Tiers for Enterprise Workloads - Best Practices (STG301) | AWS re...
 
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
 

Similaire à Picking the right AWS backend for your Java application (May 2017)

Serverless Architecture and Best Practices
Serverless Architecture and Best PracticesServerless Architecture and Best Practices
Serverless Architecture and Best Practices
Amazon Web Services
 

Similaire à Picking the right AWS backend for your Java application (May 2017) (20)

Picking the right AWS backend for your application (September 2017)
Picking the right AWS backend for your application (September 2017)Picking the right AWS backend for your application (September 2017)
Picking the right AWS backend for your application (September 2017)
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Your First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web ServicesYour First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web Services
 
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your Startup
 
Raleigh DevDay 2017: Build a serverless web application in one day workshop
Raleigh DevDay 2017: Build a serverless web application in one day workshopRaleigh DevDay 2017: Build a serverless web application in one day workshop
Raleigh DevDay 2017: Build a serverless web application in one day workshop
 
Your First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS CloudYour First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS Cloud
 
Cross-regional Application Deplolyment on AWS - Channy Yun (JAWS Days 2017)
Cross-regional Application Deplolyment on AWS - Channy Yun (JAWS Days 2017)Cross-regional Application Deplolyment on AWS - Channy Yun (JAWS Days 2017)
Cross-regional Application Deplolyment on AWS - Channy Yun (JAWS Days 2017)
 
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
 
Deep Dive RDS & Aurora - Pop-up Loft TLV 2017
Deep Dive RDS & Aurora - Pop-up Loft TLV 2017Deep Dive RDS & Aurora - Pop-up Loft TLV 2017
Deep Dive RDS & Aurora - Pop-up Loft TLV 2017
 
AWS Webcast - Understanding database options
AWS Webcast - Understanding database optionsAWS Webcast - Understanding database options
AWS Webcast - Understanding database options
 
What is Amazon Web Services & How to Start to deploy your apps ?
What is Amazon Web Services & How to Start to deploy your apps ?What is Amazon Web Services & How to Start to deploy your apps ?
What is Amazon Web Services & How to Start to deploy your apps ?
 
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
 
Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...
 Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T... Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...
Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...
 
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWSServerless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
 
(SOV204) Scaling Up to Your First 10 Million Users | AWS re:Invent 2014
(SOV204) Scaling Up to Your First 10 Million Users | AWS re:Invent 2014(SOV204) Scaling Up to Your First 10 Million Users | AWS re:Invent 2014
(SOV204) Scaling Up to Your First 10 Million Users | AWS re:Invent 2014
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Serverless Architecture and Best Practices
Serverless Architecture and Best PracticesServerless Architecture and Best Practices
Serverless Architecture and Best Practices
 

Plus de Julien SIMON

Plus de Julien SIMON (20)

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)
 
Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)
 

Picking the right AWS backend for your Java application (May 2017)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Julien Simon, Principal Technical Evangelist, AWS julsimon@amazon.fr " @julsimon Picking the right AWS backend " for your Java application
  • 2. Who am I? Hacker. Headbanger. Harley Rider. Hunter. Love/hate relationship with all versions of Java, from JDK 1.0 and PersonalJava to Java 8 :D 20+ years in R&D teams, from smartcards to web platforms VP Eng/CTO @ Digiplug, Pixmania, Criteo, Aldebaran Robotics, Viadeo Now walking the Earth for Amazon Web Services
  • 3. What to expect •  Writing Java apps on AWS •  Databases •  Amazon RDS •  Amazon DynamoDB •  Analytics •  Hive on Amazon EMR •  Amazon Athena •  Amazon Redshift •  Conclusion Code available at https://github.com/juliensimon/aws/tree/master/javabackends
  • 5. Four deployment options https://docs.aws.amazon.com/sdk-for-java/ https://github.com/aws/aws-sdk-java Java SDK for the AWS API (Java 1.6+) AWS Lambda Java 8 Open Source frameworks:" Serverless, Gordon, Apex, …. Amazon EC2 Container Service AWS ElasticBeanstalk Java 6/7/8 with Tomcat 7/8 Java 7/8 with Glassfish 4 Amazon EC2
  • 6. AWS plugin for Eclipse https://aws.amazon.com/documentation/awstoolkiteclipse/
  • 7. 3rd party plugins for Intellij IDEA •  AWS Elastic Beanstalk Integration https://plugins.jetbrains.com/plugin/7274-aws-elastic-beanstalk-integration •  AWS CloudFormation https://plugins.jetbrains.com/plugin/7371-aws-cloudformation •  AWS Manager – almost 2 years old :-/ https://plugins.jetbrains.com/plugin/4558-aws-manager
  • 8. A quick reminder on AWS Credentials •  AWS Identity and Access Management (IAM) •  Users are authenticated with an Access Key and a Secret Key •  AWS APIs and AWS resources require permissions •  Permissions are defined in policies •  Attached directly to Users or Groups •  Attached to Roles, in turn attached to Services (EC2, Lambda, etc.)
  • 9. Managing credentials •  Please do not hardcode them in your application •  Please do not store them on EC2 instances •  It WILL end in tears! •  AWS credentials: use IAM Roles •  Backend credentials: use the EC2 SM Parameter Store •  Automatic encryption with Amazon KMS https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/index.html?com/amazonaws/auth/AWSCredentialsProvider.html https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-paramstore.html
  • 12. No infrastructure management Scale up/down Cost-effective Instant provisioning Application compatibility Amazon Relational Database Service 99.95% SLA https://aws.amazon.com/rds/ https://aws.amazon.com/rds/sla/
  • 13. Amazon RDS – 6 Database Engines •  Amazon Aurora •  MySQL 5.5.46 à 5.7.16 •  MariaDB 10.0.17 à 10.1.19 •  PostgreSQL 9.3.12-R1 à 9.6.2-R1 •  Oracle 11.2.0.4.v1 à 12.1.0.2.v7 •  SQL Server 2008 à 2016 Aurora design principles https://dl.acm.org/citation.cfm?id=3056101
  • 14. Selected Amazon RDS customers
  • 15. Amazon Aurora demo Java SDK: https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/rds/AmazonRDSClient.html JDBC drivers: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/java-rds.html
  • 16. Amazon DynamoDB Document or Key-Value Scales to Any Workload Fully Managed NoSQL Access Control Event Driven Programming Fast and Consistent https://aws.amazon.com/dynamodb/ http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html
  • 17. Expedia is a leader in the $1 trillion travel industry, with an extensive portfolio that includes some of the world’s most trusted travel brands. With DynamoDB we were up and running in a less than day, and there is no need for a team to maintain. Kuldeep Chowhan Principal Engineer, Expedia ” “ •  Expedia’s real-time analytics application collects data for its “test & learn” experiments on Expedia sites. •  The analytics application processes ~200 million messages daily. •  Ease of setup, monitoring, and scaling were key factors in choosing DynamoDB. Case Study – Expedia https://aws.amazon.com/fr/blogs/big-data/how-expedia-implemented-near-real-time-analysis-of-interdependent-datasets/
  • 18. Amazon DynamoDB demo Java SDK: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/CodeSamples.Java.html https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/examples-dynamodb.html https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html
  • 19. Amazon ElastiCache Amazon" DynamoDB Amazon" RDS/Aurora Amazon ElasticSearch Amazon S3 Amazon Glacier Average latency ms ms ms, sec ms,sec ms,sec,min (~ size) hrs Typical data stored GB GB–TBs (no limit) GB–TB (64 TB max) GB–TB MB–PB (no limit) GB–PB (no limit) Typical item size B-KB KB (400 KB max) KB (64 KB max) B-KB (2 GB max) KB-TB (5 TB max) GB" (40 TB max) Request Rate High – very high Very high (no limit) High High Low – high (no limit) Very low Storage cost GB/month $$ ¢¢ ¢¢ ¢¢ ¢ ¢4/10 Durability Low - moderate Very high Very high High Very high Very high Availability High 2 AZ Very high 3 AZ Very high 3 AZ High 2 AZ Very high 3 AZ Very high 3 AZ Hot data Warm data Cold data Which Data Store Should I Use?
  • 21. Amazon Elastic Map Reduce (EMR) •  Apache Hadoop, Spark, Hive,etc. •  Managed service •  Easy to start, resize & terminate clusters •  Cost-efficient, especially with Spot Instances •  Integration with backends https://aws.amazon.com/emr/
  • 22. Case study – FINRA FINRA, the primary regulatory agency for stock brokers in the US, uses AWS extensively in their IT operations and has migrated key portions of its technology stack to AWS including Market Surveillance and Member Regulation. For market surveillance, each night FINRA loads approximately 35 billion rows of data into Amazon S3 and Amazon EMR (up to 10,000 nodes) to monitor trading activity on exchanges and market centers in the US. https://aws.amazon.com/solutions/case-studies/finra/
  • 23. Hive demo Java SDK: https://docs.aws.amazon.com/emr/latest/ManagementGuide/calling-emr-with-java-sdk.html JDBC: https://docs.aws.amazon.com//emr/latest/ReleaseGuide/HiveJDBCDriver.html
  • 24. Amazon Athena •  Run read-only SQL queries on S3 data •  No data loading, no indexing, no nothing •  No infrastructure to create, manage or scale •  Service based on Presto •  Table creation: Apache Hive Data Definition Language •  ANSI SQL operators and functions: what Presto supports, with a few exceptions AWS re:Invent 2016: Introducing Athena (BDA303) https://www.youtube.com/watch?v=DxAuj_Ky5aw https://prestodb.io
  • 25. Data formats supported by Athena •  Unstructured •  Apache logs, with customizable regular expression •  Semi-structured •  delimiter-separated values (CSV, OpenCSV) •  Tab-separated values (TSV) •  JSON •  Structured •  Apache Parquet https://parquet.apache.org/ •  Apache ORC https://orc.apache.org/ •  Apache Avro https://avro.apache.org/ •  Compression (LZO, Snappy, Zlib, GZIP) & partitioning
  • 26. Case Study – DataXu CDN Real Time" Bidding Retargeting" Platform Kinesis Attribution & ML S3 Reporting Data Visualization Data Pipeline ETL (Spark SQL) Amazon Athena 180TB of Log Data per Day https://aws.amazon.com/fr/solutions/case-studies/dataxu/
  • 27. Amazon Athena demo Java SDK: https://docs.aws.amazon.com/athena/latest/ug/code-samples.html JDBC: https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html
  • 28. •  Relational data warehouse •  SQL is all you need to know •  Fully managed •  Massively parallel •  Petabyte scale •  As low as $1000/TB/year •  Athena-like capabilities with Redshift Spectrum Amazon Redshift https://aws.amazon.com/redshift http://www.allthingsdistributed.com/2012/11/amazon-redshift.html Intro to Amazon Redshift Spectrum https://www.youtube.com/watch?v=gchd2sDhSuY
  • 29. Amazon Redshift Architecture Ingestion Backup Restore S3 / EMR / DynamoDB Leader Node 10 GigE Spectrum Query external" tables stored" in S3
  • 30. What customers says about Amazon Redshift “Redshift is twenty times faster than Hive” (5x – 20x reduction in query times) link “Queries that used to take hours came back in seconds. Our analysts are orders of magnitude more productive.” (20x – 40x reduction in query times) link …[Redshift] performance has blown away everyone here (we generally see 50-100x speedup over Hive). link “Team played with Redshift today and concluded it is ****** awesome. " Un-indexed complex queries returning in < 10s.” “Did I mention it's ridiculously fast? We'll be using it immediately to provide our analysts an alternative to Hadoop.” Channel We regularly process multibillion row datasets and we do that in a matter of hours. link
  • 31. Amazon Redshift demo Java SDK: https://docs.aws.amazon.com/redshift/latest/mgmt/using-aws-sdk-for-java.html JDBC driver: https://docs.aws.amazon.com/redshift/latest/mgmt/configure-jdbc-connection.html
  • 32. Amazon Redshift Amazon Athena Amazon EMR Presto Spark Hive Use case Optimized for data warehousing Ad-hoc Interactive Queries Interactive Query General purpose (iterative ML, RT, ..) Batch Scale/throughput ~Nodes Automatic (Spectrum) Automatic / No limits ~ Nodes AWS Managed Service Yes Yes, Serverless Yes Storage Local storage" Amazon S3 (Spectrum) Amazon S3 Amazon S3, HDFS Optimization Columnar storage, data compression, and zone maps CSV, TSV, JSON, Parquet, ORC, Apache Web log Framework dependent Metadata Amazon Redshift managed Athena Catalog Manager Hive Meta-store BI tools supports Yes (JDBC/ODBC) Yes (JDBC) Yes (JDBC/ODBC & Custom) Access controls Users, groups, and access controls AWS IAM Integration with LDAP UDF support Yes (Scalar) No Yes Slow
  • 35. •  Converts your tables, views, stored procedures, " and data manipulation language to RDS or" Amazon Redshift •  Highlights where manual edits are needed AWS Schema Conversion Tool https://aws.amazon.com/dms/
  • 36. ü  Move data to the same or different database engine ü  Move data to Redshift, DynamoDB or S3 ü  Keep your apps running during the migration ü  Start your first migration in 10 minutes or less ü  Replicate within, to, or from Amazon EC2 or RDS AWS Database Migration Service https://aws.amazon.com/dms/ http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Sources.html http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Targets.html https://aws.amazon.com/blogs/database/database-migration-what-do-you-need-to-know-before-you-start/
  • 37. AWS is a rich and lively environment for Java platforms Your choice of backends: relational, NoSQL, Big Data, analytics The tools you need, with less or no infrastructure drama Built-in high availability, scalability, security & compliance Focus on creativity and productivity, not on plumbing
  • 38. Thank you! Julien Simon, Principal Technical Evangelist, AWS julsimon@amazon.fr " @julsimon