More Related Content Similar to Amazon EMR과 SageMaker를 이용하여 데이터를 준비하고 머신러닝 모델 개발 하기 (20) More from Amazon Web Services Korea (20) Amazon EMR과 SageMaker를 이용하여 데이터를 준비하고 머신러닝 모델 개발 하기1. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon EMR과 SageMaker를
이용하여 데이터를 준비하고
머신러닝 모델 개발 하기
A W S F O R D A T A W E B I N A R
강성문
Sr. AIML Special Solutions Architect
AWS
2. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Agenda
2
SageMaker vs EMR
EMR과 SageMaker를 이용한 대용량 데이터 준비와 머신러닝 모델 개발
▪ 데모1. 환경 구성
▪ 데모2. 머신러닝 모델 개발
정리
3. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon SageMaker 와
EMR은 어떻게 다른가요?
4. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon EMR (Elastic Map Reduced)
5. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon SageMaker
PREPARE
SageMaker Ground Truth
Label training data for
machine learning
SageMaker Data Wrangler
Aggregate and prepare data
for machine learning
SageMaker Processing
Built-in Python, BYO R/Spark
SageMaker Feature Store
Store, update, retrieve, and
share features
SageMaker Clarify
Detect bias and understand
model predictions
BUILD
SageMaker Studio notebooks
Jupyter notebooks with elastic
compute and sharing
Built-in and bring-your-own
algorithms
Dozens of optimized algorithms
or bring your own
Local mode
Test and prototype on your
local machine
SageMaker Autopilot
Automatically create machine
learning models with full visibility
SageMaker JumpStart
Pre-built solutions for common
use cases
TRAIN & TUNE
One-click training
Distributed infrastructure
management
SageMaker Experiments
Capture, organize, and
compare every step
Automatic model tuning
Hyperparameter optimization
Distributed training libraries
Training for large datasets
and models
SageMaker Debugger
Debug and profile training runs
Managed spot training
Reduce training cost by 90%
DEPLOY & MANAGE
Fully managed deployment
Fully managed, ultra-low
latency, high throughput
Kubernetes & Kubeflow
integration
Simplify Kubernetes-based
machine learning
Multi-model endpoints
Reduce cost by hosting
multiple models per instance
SageMaker Model Monitor
Maintain accuracy of
deployed models
SageMaker Edge Manager
Manage and monitor models
on edge devices
SageMaker Pipelines
Workflow orchestration
and automation
Amazon SageMaker
SageMaker Studio
Integrated development environment (IDE) for ML
Not a comprehensive list. Visit aws.amazon.com/sagemaker for the latest information
데이터 준비 빌드 학습 & 튜닝 모델 배포 & 관리
6. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Machine learning cycle
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature
engineering
Model training
and parameter
tuning
Model evaluation
Monitoring and
debugging
Model
deployment
Predictions
YES
NO
7. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Build and train models using SageMaker
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature
engineering
Model training
and parameter
tuning
Model evaluation
Monitoring and
debugging
Model
deployment
Predictions
YES
NO
8. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Manage data on AWS
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature
engineering
Model training
and parameter
tuning
Model evaluation
Monitoring and
debugging
Model
deployment
Predictions
YES
NO
9. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Example Scenario
대용량 데이터 전처리 요청
전처리 결과 활용한 모델 개발
10. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
EMR과 SageMaker를 이용한
대용량 데이터 준비와
머신러닝 모델 개발
11. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
목표 시스템 구성도
1
2
12. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 1 – SageMaker Studio notebooks
13. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 1 – SageMaker Studio notebooks
14. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 2 – AWS Service Catalog
User’s custom
product list
VMs,
containers,
services
✓ 사내 정책 준수
✓ 원클릭 배포
✓ 자동화된 리소스 태깅
✓ 예산관리
AWS Service
Catalog
User
admin
Bitnami Certified
App:
WordPress
15. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 2 – AWS Service Catalog
Constraint
보안, 거버넌스,
배포 제어
Product
IT 서비스, 리소스
Products list
허용된 Product 목록 조회
Portfolio
Product의 집합
Provisioned products
서비스/리소스 생성 및 실행
AWS Service Catalog Administrator
AWS Service Catalog End User
JSON, YML, or
Terraform
16. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
데모1
[플랫폼 엔지니어 대상] SageMaker
Studio에서 EMR 생성하고 접속할
수 있는 환경 구성
17. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
18. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
목표 시스템 구성도
2
3
1
19. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 3 – Apache Livy and SparkMagic
https://livy.apache.org/
20. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 3 – Apache Livy and SparkMagic
https://github.com/jupyter-incubator/sparkmagic
21. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 3 – Apache Livy and SparkMagic
22. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
데모2
[데이터 사이언티스트 대상]
SageMaker Studio에서 EMR
접속하고 데이터 준비 및 머신러닝
모델 개발하기
23. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
24. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
정리
25. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Build and train models using SageMaker
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature
engineering
Model training
and parameter
tuning
Model evaluation
Monitoring and
debugging
Model
deployment
Predictions
YES
NO
26. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
목표 시스템 구성도
1
2
3
27. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
SageMaker 에서 Spark를 사용하는 다른 방법
SageMaker Processing SageMaker Spark Library
Data
Data
전처리 Script
SageMaker
Spark Framework
• SageMakerEstimator
• KMeansSageMakerEstimator
• PCASageMakerEstimator
• XGBoostSageMakerEstimator
• SageMakerModel
• …
EMR with SageMaker Pipeline
28. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
References
37
• SageMaker Studio EMR Integration example code - https://github.com/aws-samples/sagemaker-studio-emr
• SageMaker Studio integration with EMR Workshop - https://catalog.workshops.aws/sagemaker-studio-emr/en-US
• Train an ML Model using Apache Spark in EMR and deploy in SageMaker - https://github.com/aws/amazon-sagemaker-
examples/blob/main/sagemaker-python-sdk/sparkml_serving_emr_mleap_abalone/sparkml_serving_emr_mleap_abalone.ipynb
• Create and manage Amazon EMR clusters from SageMaker Studio to run interactive Spark and ML workloads -
https://aws.amazon.com/ko/blogs/machine-learning/part-1-create-and-manage-amazon-emr-clusters-from-sagemaker-studio-to-run-
interactive-spark-and-ml-workloads/
• Prepare data at scale with SageMaker Studio notebooks - https://docs.aws.amazon.com/sagemaker/latest/dg/studio-notebooks-emr-
cluster.html
• Connect SageMaker Studio Notebooks in a VPC to External Resources - https://docs.aws.amazon.com/sagemaker/latest/dg/studio-notebooks-
and-internet-access.html
• Apache Livy - https://livy.apache.org/
• Spark Magic - https://github.com/jupyter-incubator/sparkmagic
• Use Apache Spark with Amazon SageMaker - https://docs.aws.amazon.com/sagemaker/latest/dg/apache-spark.html
• Amazon SageMaker Processing (with Spark) - https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_processing.html#amazon-
sagemaker-processing
• Train an ML Model using Apache Spark in EMR and deploy in SageMaker - https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-
python-sdk/sparkml_serving_emr_mleap_abalone/sparkml_serving_emr_mleap_abalone.html
• SageMaker Pipeline Step (with EMR) - https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html
29. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Thank you!
© 2022, Amazon Web Services, Inc. or its affiliates.
강성문
kseongmo@amazon.com