SlideShare a Scribd company logo
1 of 29
Download to read offline
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon EMR과 SageMaker를
이용하여 데이터를 준비하고
머신러닝 모델 개발 하기
A W S F O R D A T A W E B I N A R
강성문
Sr. AIML Special Solutions Architect
AWS
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Agenda
2
SageMaker vs EMR
EMR과 SageMaker를 이용한 대용량 데이터 준비와 머신러닝 모델 개발
▪ 데모1. 환경 구성
▪ 데모2. 머신러닝 모델 개발
정리
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon SageMaker 와
EMR은 어떻게 다른가요?
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon EMR (Elastic Map Reduced)
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon SageMaker
PREPARE
SageMaker Ground Truth
Label training data for
machine learning
SageMaker Data Wrangler
Aggregate and prepare data
for machine learning
SageMaker Processing
Built-in Python, BYO R/Spark
SageMaker Feature Store
Store, update, retrieve, and
share features
SageMaker Clarify
Detect bias and understand
model predictions
BUILD
SageMaker Studio notebooks
Jupyter notebooks with elastic
compute and sharing
Built-in and bring-your-own
algorithms
Dozens of optimized algorithms
or bring your own
Local mode
Test and prototype on your
local machine
SageMaker Autopilot
Automatically create machine
learning models with full visibility
SageMaker JumpStart
Pre-built solutions for common
use cases
TRAIN & TUNE
One-click training
Distributed infrastructure
management
SageMaker Experiments
Capture, organize, and
compare every step
Automatic model tuning
Hyperparameter optimization
Distributed training libraries
Training for large datasets
and models
SageMaker Debugger
Debug and profile training runs
Managed spot training
Reduce training cost by 90%
DEPLOY & MANAGE
Fully managed deployment
Fully managed, ultra-low
latency, high throughput
Kubernetes & Kubeflow
integration
Simplify Kubernetes-based
machine learning
Multi-model endpoints
Reduce cost by hosting
multiple models per instance
SageMaker Model Monitor
Maintain accuracy of
deployed models
SageMaker Edge Manager
Manage and monitor models
on edge devices
SageMaker Pipelines
Workflow orchestration
and automation
Amazon SageMaker
SageMaker Studio
Integrated development environment (IDE) for ML
Not a comprehensive list. Visit aws.amazon.com/sagemaker for the latest information
데이터 준비 빌드 학습 & 튜닝 모델 배포 & 관리
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Machine learning cycle
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature
engineering
Model training
and parameter
tuning
Model evaluation
Monitoring and
debugging
Model
deployment
Predictions
YES
NO
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Build and train models using SageMaker
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature
engineering
Model training
and parameter
tuning
Model evaluation
Monitoring and
debugging
Model
deployment
Predictions
YES
NO
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Manage data on AWS
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature
engineering
Model training
and parameter
tuning
Model evaluation
Monitoring and
debugging
Model
deployment
Predictions
YES
NO
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Example Scenario
대용량 데이터 전처리 요청
전처리 결과 활용한 모델 개발
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
EMR과 SageMaker를 이용한
대용량 데이터 준비와
머신러닝 모델 개발
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
목표 시스템 구성도
1
2
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 1 – SageMaker Studio notebooks
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 1 – SageMaker Studio notebooks
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 2 – AWS Service Catalog
User’s custom
product list
VMs,
containers,
services
✓ 사내 정책 준수
✓ 원클릭 배포
✓ 자동화된 리소스 태깅
✓ 예산관리
AWS Service
Catalog
User
admin
Bitnami Certified
App:
WordPress
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 2 – AWS Service Catalog
Constraint
보안, 거버넌스,
배포 제어
Product
IT 서비스, 리소스
Products list
허용된 Product 목록 조회
Portfolio
Product의 집합
Provisioned products
서비스/리소스 생성 및 실행
AWS Service Catalog Administrator
AWS Service Catalog End User
JSON, YML, or
Terraform
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
데모1
[플랫폼 엔지니어 대상] SageMaker
Studio에서 EMR 생성하고 접속할
수 있는 환경 구성
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
목표 시스템 구성도
2
3
1
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 3 – Apache Livy and SparkMagic
https://livy.apache.org/
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 3 – Apache Livy and SparkMagic
https://github.com/jupyter-incubator/sparkmagic
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
구성요소 3 – Apache Livy and SparkMagic
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
데모2
[데이터 사이언티스트 대상]
SageMaker Studio에서 EMR
접속하고 데이터 준비 및 머신러닝
모델 개발하기
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
정리
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Build and train models using SageMaker
Business
Problem
ML problem
framing
Data collection
Data integration
Data preparation
and cleaning
Data visualization
and analysis
Feature
engineering
Model training
and parameter
tuning
Model evaluation
Monitoring and
debugging
Model
deployment
Predictions
YES
NO
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
목표 시스템 구성도
1
2
3
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
SageMaker 에서 Spark를 사용하는 다른 방법
SageMaker Processing SageMaker Spark Library
Data
Data
전처리 Script
SageMaker
Spark Framework
• SageMakerEstimator
• KMeansSageMakerEstimator
• PCASageMakerEstimator
• XGBoostSageMakerEstimator
• SageMakerModel
• …
EMR with SageMaker Pipeline
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
References
37
• SageMaker Studio EMR Integration example code - https://github.com/aws-samples/sagemaker-studio-emr
• SageMaker Studio integration with EMR Workshop - https://catalog.workshops.aws/sagemaker-studio-emr/en-US
• Train an ML Model using Apache Spark in EMR and deploy in SageMaker - https://github.com/aws/amazon-sagemaker-
examples/blob/main/sagemaker-python-sdk/sparkml_serving_emr_mleap_abalone/sparkml_serving_emr_mleap_abalone.ipynb
• Create and manage Amazon EMR clusters from SageMaker Studio to run interactive Spark and ML workloads -
https://aws.amazon.com/ko/blogs/machine-learning/part-1-create-and-manage-amazon-emr-clusters-from-sagemaker-studio-to-run-
interactive-spark-and-ml-workloads/
• Prepare data at scale with SageMaker Studio notebooks - https://docs.aws.amazon.com/sagemaker/latest/dg/studio-notebooks-emr-
cluster.html
• Connect SageMaker Studio Notebooks in a VPC to External Resources - https://docs.aws.amazon.com/sagemaker/latest/dg/studio-notebooks-
and-internet-access.html
• Apache Livy - https://livy.apache.org/
• Spark Magic - https://github.com/jupyter-incubator/sparkmagic
• Use Apache Spark with Amazon SageMaker - https://docs.aws.amazon.com/sagemaker/latest/dg/apache-spark.html
• Amazon SageMaker Processing (with Spark) - https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_processing.html#amazon-
sagemaker-processing
• Train an ML Model using Apache Spark in EMR and deploy in SageMaker - https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-
python-sdk/sparkml_serving_emr_mleap_abalone/sparkml_serving_emr_mleap_abalone.html
• SageMaker Pipeline Step (with EMR) - https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html
AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR
© 2022, Amazon Web Services, Inc. or its affiliates.
Thank you!
© 2022, Amazon Web Services, Inc. or its affiliates.
강성문
kseongmo@amazon.com

More Related Content

What's hot

AWS DMS를 통한 오라클 DB 마이그레이션 방법 - AWS Summit Seoul 2017
AWS DMS를 통한 오라클 DB 마이그레이션 방법 - AWS Summit Seoul 2017AWS DMS를 통한 오라클 DB 마이그레이션 방법 - AWS Summit Seoul 2017
AWS DMS를 통한 오라클 DB 마이그레이션 방법 - AWS Summit Seoul 2017
Amazon Web Services Korea
 

What's hot (20)

LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
 
AWS DMS를 통한 오라클 DB 마이그레이션 방법 - AWS Summit Seoul 2017
AWS DMS를 통한 오라클 DB 마이그레이션 방법 - AWS Summit Seoul 2017AWS DMS를 통한 오라클 DB 마이그레이션 방법 - AWS Summit Seoul 2017
AWS DMS를 통한 오라클 DB 마이그레이션 방법 - AWS Summit Seoul 2017
 
DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환
DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환
DMS와 SCT를 활용한 Oracle에서 Open Source DB로의 전환
 
AWS Aurora 100% 활용하기
AWS Aurora 100% 활용하기AWS Aurora 100% 활용하기
AWS Aurora 100% 활용하기
 
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
 
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
효율적인 빅데이터 분석 및 처리를 위한 Glue, EMR 활용 - 김태현 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
 
아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...
아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...
아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
 
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
 
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
 
금융 회사를 위한 클라우드 이용 가이드 – 신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...
금융 회사를 위한 클라우드 이용 가이드 –  신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...금융 회사를 위한 클라우드 이용 가이드 –  신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...
금융 회사를 위한 클라우드 이용 가이드 – 신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...
 
세션 3: IT 담당자를 위한 Cloud 로의 전환
세션 3: IT 담당자를 위한 Cloud 로의 전환세션 3: IT 담당자를 위한 Cloud 로의 전환
세션 3: IT 담당자를 위한 Cloud 로의 전환
 
KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...
KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...
KB국민카드 - 클라우드 기반 분석 플랫폼 혁신 여정 - 발표자: 박창용 과장, 데이터전략본부, AI혁신부, KB카드│강병억, Soluti...
 
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
20200623 AWS Black Belt Online Seminar Amazon Elasticsearch Service
 
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
 
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
 
AWS Black Belt Online Seminar 2016 AWS上でのActive Directory構築
AWS Black Belt Online Seminar 2016 AWS上でのActive Directory構築AWS Black Belt Online Seminar 2016 AWS上でのActive Directory構築
AWS Black Belt Online Seminar 2016 AWS上でのActive Directory構築
 
Amazon Timestream 시계열 데이터 전용 DB 소개 :: 변규현 - AWS Community Day 2019
Amazon Timestream 시계열 데이터 전용 DB 소개 :: 변규현 - AWS Community Day 2019Amazon Timestream 시계열 데이터 전용 DB 소개 :: 변규현 - AWS Community Day 2019
Amazon Timestream 시계열 데이터 전용 DB 소개 :: 변규현 - AWS Community Day 2019
 
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
 
데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...
데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...
데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...
 

Similar to Amazon EMR과 SageMaker를 이용하여 데이터를 준비하고 머신러닝 모델 개발 하기

Opinionated re:Invent recap with AWS Heroes & Builders
Opinionated re:Invent recap with AWS Heroes & BuildersOpinionated re:Invent recap with AWS Heroes & Builders
Opinionated re:Invent recap with AWS Heroes & Builders
Daniel Zivkovic
 

Similar to Amazon EMR과 SageMaker를 이용하여 데이터를 준비하고 머신러닝 모델 개발 하기 (20)

AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and Data
 
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
 
AWS Advanced Analytics Automation Toolkit (AAA)
AWS Advanced Analytics Automation Toolkit (AAA)AWS Advanced Analytics Automation Toolkit (AAA)
AWS Advanced Analytics Automation Toolkit (AAA)
 
Machine Learning with Amazon SageMaker
Machine Learning with Amazon SageMakerMachine Learning with Amazon SageMaker
Machine Learning with Amazon SageMaker
 
Supercharge Your Machine Learning Solutions with Amazon SageMaker
Supercharge Your Machine Learning Solutions with Amazon SageMakerSupercharge Your Machine Learning Solutions with Amazon SageMaker
Supercharge Your Machine Learning Solutions with Amazon SageMaker
 
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWS
 
Supercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMakerSupercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMaker
 
20 ways event-driven architectures can improve your development - Copy.pptx
20 ways event-driven architectures can improve your development - Copy.pptx20 ways event-driven architectures can improve your development - Copy.pptx
20 ways event-driven architectures can improve your development - Copy.pptx
 
Amazon SageMaker
Amazon SageMakerAmazon SageMaker
Amazon SageMaker
 
apidays Paris 2022 - Optimizing architectures for sustainability, Rudy Krol, AWS
apidays Paris 2022 - Optimizing architectures for sustainability, Rudy Krol, AWSapidays Paris 2022 - Optimizing architectures for sustainability, Rudy Krol, AWS
apidays Paris 2022 - Optimizing architectures for sustainability, Rudy Krol, AWS
 
Frome Code to Cloud: Exploring AWS CDK for Infrastructure Management
Frome Code to Cloud: Exploring AWS CDK for Infrastructure ManagementFrome Code to Cloud: Exploring AWS CDK for Infrastructure Management
Frome Code to Cloud: Exploring AWS CDK for Infrastructure Management
 
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
 
Build, Train, and Deploy ML Models Quickly and Easily with Amazon SageMaker, ...
Build, Train, and Deploy ML Models Quickly and Easily with Amazon SageMaker, ...Build, Train, and Deploy ML Models Quickly and Easily with Amazon SageMaker, ...
Build, Train, and Deploy ML Models Quickly and Easily with Amazon SageMaker, ...
 
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...
 
Amazon SageMaker workshop
Amazon SageMaker workshopAmazon SageMaker workshop
Amazon SageMaker workshop
 
Easily Label Training Data For Machine Learning At Scale.pptx
Easily Label Training Data For Machine Learning At Scale.pptxEasily Label Training Data For Machine Learning At Scale.pptx
Easily Label Training Data For Machine Learning At Scale.pptx
 
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
 
Opinionated re:Invent recap with AWS Heroes & Builders
Opinionated re:Invent recap with AWS Heroes & BuildersOpinionated re:Invent recap with AWS Heroes & Builders
Opinionated re:Invent recap with AWS Heroes & Builders
 
Building Modern Streaming Analytics with Confluent on AWS
Building Modern Streaming Analytics with Confluent on AWSBuilding Modern Streaming Analytics with Confluent on AWS
Building Modern Streaming Analytics with Confluent on AWS
 

More from Amazon Web Services Korea

More from Amazon Web Services Korea (20)

AWS Modern Infra with Storage Roadshow 2023 - Day 2
AWS Modern Infra with Storage Roadshow 2023 - Day 2AWS Modern Infra with Storage Roadshow 2023 - Day 2
AWS Modern Infra with Storage Roadshow 2023 - Day 2
 
AWS Modern Infra with Storage Roadshow 2023 - Day 1
AWS Modern Infra with Storage Roadshow 2023 - Day 1AWS Modern Infra with Storage Roadshow 2023 - Day 1
AWS Modern Infra with Storage Roadshow 2023 - Day 1
 
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
 
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
 
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
 
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
 
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
 
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
 
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
 
Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...
Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...
Amazon Redshift Deep Dive - Serverless, Streaming, ML, Auto Copy (New feature...
 
From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...From Insights to Action, How to build and maintain a Data Driven Organization...
From Insights to Action, How to build and maintain a Data Driven Organization...
 
[Keynote] Accelerating Business Outcomes with AWS Data - 발표자: Saeed Gharadagh...
[Keynote] Accelerating Business Outcomes with AWS Data - 발표자: Saeed Gharadagh...[Keynote] Accelerating Business Outcomes with AWS Data - 발표자: Saeed Gharadagh...
[Keynote] Accelerating Business Outcomes with AWS Data - 발표자: Saeed Gharadagh...
 
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
 
LG전자 - Amazon Aurora 및 RDS 블루/그린 배포를 이용한 데이터베이스 업그레이드 안정성 확보 - 발표자: 이은경 책임, L...
LG전자 - Amazon Aurora 및 RDS 블루/그린 배포를 이용한 데이터베이스 업그레이드 안정성 확보 - 발표자: 이은경 책임, L...LG전자 - Amazon Aurora 및 RDS 블루/그린 배포를 이용한 데이터베이스 업그레이드 안정성 확보 - 발표자: 이은경 책임, L...
LG전자 - Amazon Aurora 및 RDS 블루/그린 배포를 이용한 데이터베이스 업그레이드 안정성 확보 - 발표자: 이은경 책임, L...
 
SK Telecom - 망관리 프로젝트 TANGO의 오픈소스 데이터베이스 전환 여정 - 발표자 : 박승전, Project Manager, ...
SK Telecom - 망관리 프로젝트 TANGO의 오픈소스 데이터베이스 전환 여정 - 발표자 : 박승전, Project Manager, ...SK Telecom - 망관리 프로젝트 TANGO의 오픈소스 데이터베이스 전환 여정 - 발표자 : 박승전, Project Manager, ...
SK Telecom - 망관리 프로젝트 TANGO의 오픈소스 데이터베이스 전환 여정 - 발표자 : 박승전, Project Manager, ...
 
코리안리 - 데이터 분석 플랫폼 구축 여정, 그 시작과 과제 - 발표자: 김석기 그룹장, 데이터비즈니스센터, 메가존클라우드 ::: AWS ...
코리안리 - 데이터 분석 플랫폼 구축 여정, 그 시작과 과제 - 발표자: 김석기 그룹장, 데이터비즈니스센터, 메가존클라우드 ::: AWS ...코리안리 - 데이터 분석 플랫폼 구축 여정, 그 시작과 과제 - 발표자: 김석기 그룹장, 데이터비즈니스센터, 메가존클라우드 ::: AWS ...
코리안리 - 데이터 분석 플랫폼 구축 여정, 그 시작과 과제 - 발표자: 김석기 그룹장, 데이터비즈니스센터, 메가존클라우드 ::: AWS ...
 
[Keynote] Data Driven Organizations with AWS Data - 발표자: Agnes Panosian, Head...
[Keynote] Data Driven Organizations with AWS Data - 발표자: Agnes Panosian, Head...[Keynote] Data Driven Organizations with AWS Data - 발표자: Agnes Panosian, Head...
[Keynote] Data Driven Organizations with AWS Data - 발표자: Agnes Panosian, Head...
 
AWS Summit Seoul 2023 | Amazon Neptune 및 Elastic을 이용한 추천 서비스 및 검색 플랫폼 구축하기
AWS Summit Seoul 2023 | Amazon Neptune 및 Elastic을 이용한 추천 서비스 및 검색 플랫폼 구축하기AWS Summit Seoul 2023 | Amazon Neptune 및 Elastic을 이용한 추천 서비스 및 검색 플랫폼 구축하기
AWS Summit Seoul 2023 | Amazon Neptune 및 Elastic을 이용한 추천 서비스 및 검색 플랫폼 구축하기
 
AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기
AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기
AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기
 
AWS Summit Seoul 2023 | 스타트업의 서버리스 기반 SaaS 데이터 처리 및 데이터웨어하우스 구축 사례
AWS Summit Seoul 2023 | 스타트업의 서버리스 기반 SaaS 데이터 처리 및 데이터웨어하우스 구축 사례AWS Summit Seoul 2023 | 스타트업의 서버리스 기반 SaaS 데이터 처리 및 데이터웨어하우스 구축 사례
AWS Summit Seoul 2023 | 스타트업의 서버리스 기반 SaaS 데이터 처리 및 데이터웨어하우스 구축 사례
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Amazon EMR과 SageMaker를 이용하여 데이터를 준비하고 머신러닝 모델 개발 하기

  • 1. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Amazon EMR과 SageMaker를 이용하여 데이터를 준비하고 머신러닝 모델 개발 하기 A W S F O R D A T A W E B I N A R 강성문 Sr. AIML Special Solutions Architect AWS
  • 2. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Agenda 2 SageMaker vs EMR EMR과 SageMaker를 이용한 대용량 데이터 준비와 머신러닝 모델 개발 ▪ 데모1. 환경 구성 ▪ 데모2. 머신러닝 모델 개발 정리
  • 3. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Amazon SageMaker 와 EMR은 어떻게 다른가요?
  • 4. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Amazon EMR (Elastic Map Reduced)
  • 5. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Amazon SageMaker PREPARE SageMaker Ground Truth Label training data for machine learning SageMaker Data Wrangler Aggregate and prepare data for machine learning SageMaker Processing Built-in Python, BYO R/Spark SageMaker Feature Store Store, update, retrieve, and share features SageMaker Clarify Detect bias and understand model predictions BUILD SageMaker Studio notebooks Jupyter notebooks with elastic compute and sharing Built-in and bring-your-own algorithms Dozens of optimized algorithms or bring your own Local mode Test and prototype on your local machine SageMaker Autopilot Automatically create machine learning models with full visibility SageMaker JumpStart Pre-built solutions for common use cases TRAIN & TUNE One-click training Distributed infrastructure management SageMaker Experiments Capture, organize, and compare every step Automatic model tuning Hyperparameter optimization Distributed training libraries Training for large datasets and models SageMaker Debugger Debug and profile training runs Managed spot training Reduce training cost by 90% DEPLOY & MANAGE Fully managed deployment Fully managed, ultra-low latency, high throughput Kubernetes & Kubeflow integration Simplify Kubernetes-based machine learning Multi-model endpoints Reduce cost by hosting multiple models per instance SageMaker Model Monitor Maintain accuracy of deployed models SageMaker Edge Manager Manage and monitor models on edge devices SageMaker Pipelines Workflow orchestration and automation Amazon SageMaker SageMaker Studio Integrated development environment (IDE) for ML Not a comprehensive list. Visit aws.amazon.com/sagemaker for the latest information 데이터 준비 빌드 학습 & 튜닝 모델 배포 & 관리
  • 6. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Machine learning cycle Business Problem ML problem framing Data collection Data integration Data preparation and cleaning Data visualization and analysis Feature engineering Model training and parameter tuning Model evaluation Monitoring and debugging Model deployment Predictions YES NO
  • 7. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Build and train models using SageMaker Business Problem ML problem framing Data collection Data integration Data preparation and cleaning Data visualization and analysis Feature engineering Model training and parameter tuning Model evaluation Monitoring and debugging Model deployment Predictions YES NO
  • 8. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Manage data on AWS Business Problem ML problem framing Data collection Data integration Data preparation and cleaning Data visualization and analysis Feature engineering Model training and parameter tuning Model evaluation Monitoring and debugging Model deployment Predictions YES NO
  • 9. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Example Scenario 대용량 데이터 전처리 요청 전처리 결과 활용한 모델 개발
  • 10. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. EMR과 SageMaker를 이용한 대용량 데이터 준비와 머신러닝 모델 개발
  • 11. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 목표 시스템 구성도 1 2
  • 12. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 1 – SageMaker Studio notebooks
  • 13. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 1 – SageMaker Studio notebooks
  • 14. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 2 – AWS Service Catalog User’s custom product list VMs, containers, services ✓ 사내 정책 준수 ✓ 원클릭 배포 ✓ 자동화된 리소스 태깅 ✓ 예산관리 AWS Service Catalog User admin Bitnami Certified App: WordPress
  • 15. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 2 – AWS Service Catalog Constraint 보안, 거버넌스, 배포 제어 Product IT 서비스, 리소스 Products list 허용된 Product 목록 조회 Portfolio Product의 집합 Provisioned products 서비스/리소스 생성 및 실행 AWS Service Catalog Administrator AWS Service Catalog End User JSON, YML, or Terraform
  • 16. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. 데모1 [플랫폼 엔지니어 대상] SageMaker Studio에서 EMR 생성하고 접속할 수 있는 환경 구성
  • 17. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates.
  • 18. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 목표 시스템 구성도 2 3 1
  • 19. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 3 – Apache Livy and SparkMagic https://livy.apache.org/
  • 20. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 3 – Apache Livy and SparkMagic https://github.com/jupyter-incubator/sparkmagic
  • 21. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 구성요소 3 – Apache Livy and SparkMagic
  • 22. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. 데모2 [데이터 사이언티스트 대상] SageMaker Studio에서 EMR 접속하고 데이터 준비 및 머신러닝 모델 개발하기
  • 23. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates.
  • 24. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. 정리
  • 25. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Build and train models using SageMaker Business Problem ML problem framing Data collection Data integration Data preparation and cleaning Data visualization and analysis Feature engineering Model training and parameter tuning Model evaluation Monitoring and debugging Model deployment Predictions YES NO
  • 26. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. 목표 시스템 구성도 1 2 3
  • 27. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. SageMaker 에서 Spark를 사용하는 다른 방법 SageMaker Processing SageMaker Spark Library Data Data 전처리 Script SageMaker Spark Framework • SageMakerEstimator • KMeansSageMakerEstimator • PCASageMakerEstimator • XGBoostSageMakerEstimator • SageMakerModel • … EMR with SageMaker Pipeline
  • 28. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. References 37 • SageMaker Studio EMR Integration example code - https://github.com/aws-samples/sagemaker-studio-emr • SageMaker Studio integration with EMR Workshop - https://catalog.workshops.aws/sagemaker-studio-emr/en-US • Train an ML Model using Apache Spark in EMR and deploy in SageMaker - https://github.com/aws/amazon-sagemaker- examples/blob/main/sagemaker-python-sdk/sparkml_serving_emr_mleap_abalone/sparkml_serving_emr_mleap_abalone.ipynb • Create and manage Amazon EMR clusters from SageMaker Studio to run interactive Spark and ML workloads - https://aws.amazon.com/ko/blogs/machine-learning/part-1-create-and-manage-amazon-emr-clusters-from-sagemaker-studio-to-run- interactive-spark-and-ml-workloads/ • Prepare data at scale with SageMaker Studio notebooks - https://docs.aws.amazon.com/sagemaker/latest/dg/studio-notebooks-emr- cluster.html • Connect SageMaker Studio Notebooks in a VPC to External Resources - https://docs.aws.amazon.com/sagemaker/latest/dg/studio-notebooks- and-internet-access.html • Apache Livy - https://livy.apache.org/ • Spark Magic - https://github.com/jupyter-incubator/sparkmagic • Use Apache Spark with Amazon SageMaker - https://docs.aws.amazon.com/sagemaker/latest/dg/apache-spark.html • Amazon SageMaker Processing (with Spark) - https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_processing.html#amazon- sagemaker-processing • Train an ML Model using Apache Spark in EMR and deploy in SageMaker - https://sagemaker-examples.readthedocs.io/en/latest/sagemaker- python-sdk/sparkml_serving_emr_mleap_abalone/sparkml_serving_emr_mleap_abalone.html • SageMaker Pipeline Step (with EMR) - https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html
  • 29. AWS FOR DATA WEBINAR – SAGEMAKER WITH EMR © 2022, Amazon Web Services, Inc. or its affiliates. Thank you! © 2022, Amazon Web Services, Inc. or its affiliates. 강성문 kseongmo@amazon.com