- Identify best practices to migrate from on-premise deployments to Amazon EMR
- Understand hot to move data from HDFS to Amazon S3
- Use Amazon EC2 Spot instances to save costs
Businesses are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premise deployments to Amazon EMR in order to save costs, increase availability, and improve performance. Amazon EMR is a managed service that lets you process and analyze extremely large data sets using the latest versions of over 15 open-source frameworks in the Apache Hadoop and Spark ecosystems. This tech talk will focus on identifying the components and workflows in your current environment and providing the best practices to migrate these workloads to Amazon EMR. We will explain how to move from HDFS to Amazon S3 as a durable storage layer, and how to lower costs with Amazon EC2 Spot instances and Auto Scaling. Additionally, we will go over common security recommendations and tuning tips to accelerate the time to production.