SlideShare une entreprise Scribd logo
1  sur  16
The Data Platform Administration
Handling the 100 PB
May 19th, 2022
Yongduck Lee
Cloud Platform Department
Rakuten Group, Inc.
2
About me
Lecture History
- Colloquium Lecturer at KAIST
Program Committee
- BigComp2017/2019
- EDB 2016
Certification
- Certified Scrum Master (CSM)
- Certified Project Management Professional (PMP #1255421)
… ETC
Lee Yongduck Daniel
A Vice Section Manager and Senior Architect at Data Storage and
Processing Section in Rakuten Group, Inc.
Started as Recommendation Engine Developer and now is focusing on
researching and verifying new Big Data Technology and how to support
users who want to use Big Data System.
B.Sc in Korea University in 2001.
21 years in Japan and have been worked for many organization and
company such as NHK, NTTD and Rakuten Group, Inc.
3
CONTENTS
1. Global Internet & Data Explosion
2. Data in Rakuten
3. Data platform & Big Data Administrator in Rakuten
4. What Advantages as Engineer in Rakuten
4
Internet & Globalization
The Internet is the global system of interconnected computer networks that use the Internet protocol
suite (TCP/IP) to link devices worldwide. It is a network of networks that consists of private, public, academic,
business, and government networks of local to global scope, linked by a broad array of electronic, wireless,
and optical networking technologies
G
C
Vast
Unstructured 80%
Structured 20%
35.2 ZB in 2020
The origins of the Internet date back to research
commissioned by the federal government of the
United States in the 1960s to build robust, fault-
tolerant communication with computer networks.
https://en.wikipedia.org/wiki/Internet#World_Wide_Web
* From IDC white paper & EMC
hances
Lobalization
Information
Structure Volume
5
Internet Users
Internet users are defined as persons who accessed the Internet in the last 12 months from any device,
including mobile phones.
https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users#cite_note-UN_WPP-14
6
Internet Users
https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users#cite_note-UN_WPP-14
In Japan 92.3% are using Internet ( Population 127,202,192 / Internet Users 117,400,000 )
At 2018
7
8
9
The Big Data in Rakuten
There are huge potential value and possibilities due to Diversity of Service and Users not
only from Japan but also Global. It is very interesting and ideal environment for Data
Scientiest and Data Analyst.
Increase synergy effect on personalization, clustering, segmentation, etc. by combining
data from various services.
The large volume of data every day, every month, and every year from services and users.
It is a big challenge to store data and make it easy to utilize for data users as System
Infrastructure Engineer and Data Engineer.
Diversity and Synergy
Scale
10
Rakuten Hadoop and Kafka
Supporting near-realtime & streaming processing in
each region.
Handling data totally around 1.3 Million Message/sec
( 10 GB/sec IN/OUT) around peak time at normal
date.
At 2021 Super Sale, we handled more than 2.5 times
messages and traffics.
Supporting Data Lake, Data Mart, and Data Analysis
for Rakuten Service in each region.
Lots of value mining from big data are being done by
data scientist and contributing on Rakuten Service.
Kafka: 800 Core, 20TB Mem, 4728 Topics
Hadoop : 80K Core, 600 TB Mem, 160K TB Disk
11
The Challenge on Administration
12
The Big Data in Rakuten
Platform/Middleware
Administrator
Users
Project/Product
Manager
Big Data Platform
Administrator
Infra/Server
Administrator
Network
Administrator
Software/System
Architect
Software
Developer
13
Administration Use CASE (HBase)
User reported performance issues on HBase but there were no issues or report from other users who are using
other component on Hadoop.
Confirm Way to get/put data on HBase
• HBase
Configuration
Architecture, Work/Dataflow.
Application/GC Logs
• Dependency Component (*HDFS)
READ/Write Performance Logs
Application/GC Logs
• DISK/Mem/CPU Load
• Kernel Log
• Network Connection
Date
&
Time
Matching
Data Hot Spotting.
Data or Configuration Caching
HDFS
JVM Config change
Increasing Handler
Increasing Scanner Interval
HW Improvement
Master Node Replacement
Reduced RegionServers
Move HDD to NVMe
Dedicated RegionServers
OS Configuration
Root noprocs, nofiles increasing on Dedicated RS
HBASE
TCPNoDelay, Parallel Seeking , Master Table Locality
WRITE/Short-READ/Long-READ Queue
DEADLINE Scheduler, Hedged Reads, Short Circuit READ
14
What Advantages in Rakuten as Data Engineer
You can go through all necessary domains of Big Data Platform to get rich experience for Big Data Platform
Administrators. Rakuten has experts who have rich knowledges and experiences on each technical and
management domain.
15
What Advantages in Rakuten as Data Engineer
You can also work with various stakeholders from various service domain, from the point of data utilization.
DB
Services
Event
INFRA
…
The Data Platform Administration Handling the 100 PB.pdf

Contenu connexe

Tendances

大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開Rakuten Group, Inc.
 
楽天サービスとインフラ部隊
楽天サービスとインフラ部隊楽天サービスとインフラ部隊
楽天サービスとインフラ部隊Rakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用Rakuten Group, Inc.
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfRakuten Group, Inc.
 
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理NTT DATA Technology & Innovation
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のりRakuten Group, Inc.
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みRakuten Group, Inc.
 
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Mineaki Motohashi
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfRakuten Group, Inc.
 
楽天市場で使われている技術、エンジニアに必要なコアスキルとはTechnology used in Rakuten, core skills neede...
楽天市場で使われている技術、エンジニアに必要なコアスキルとはTechnology used in Rakuten,  core skills  neede...楽天市場で使われている技術、エンジニアに必要なコアスキルとはTechnology used in Rakuten,  core skills  neede...
楽天市場で使われている技術、エンジニアに必要なコアスキルとはTechnology used in Rakuten, core skills neede...Rakuten Group, Inc.
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~NTT DATA OSS Professional Services
 
NTTデータが考えるデータ基盤の次の一手 ~AI活用のために知っておくべき新潮流とは?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
NTTデータが考えるデータ基盤の次の一手 ~AI活用のために知っておくべき新潮流とは?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)NTTデータが考えるデータ基盤の次の一手 ~AI活用のために知っておくべき新潮流とは?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
NTTデータが考えるデータ基盤の次の一手 ~AI活用のために知っておくべき新潮流とは?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)NTT DATA Technology & Innovation
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方Yoshiyasu SAEKI
 
JJUG CCC リクルートの Java に対する取り組み
JJUG CCC リクルートの Java に対する取り組みJJUG CCC リクルートの Java に対する取り組み
JJUG CCC リクルートの Java に対する取り組みRecruit Technologies
 

Tendances (20)

大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
楽天サービスとインフラ部隊
楽天サービスとインフラ部隊楽天サービスとインフラ部隊
楽天サービスとインフラ部隊
 
Rakuten Platform
Rakuten PlatformRakuten Platform
Rakuten Platform
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
 
KafkaとPulsar
KafkaとPulsarKafkaとPulsar
KafkaとPulsar
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
楽天市場で使われている技術、エンジニアに必要なコアスキルとはTechnology used in Rakuten, core skills neede...
楽天市場で使われている技術、エンジニアに必要なコアスキルとはTechnology used in Rakuten,  core skills  neede...楽天市場で使われている技術、エンジニアに必要なコアスキルとはTechnology used in Rakuten,  core skills  neede...
楽天市場で使われている技術、エンジニアに必要なコアスキルとはTechnology used in Rakuten, core skills neede...
 
楽天がHadoopを使う理由
楽天がHadoopを使う理由楽天がHadoopを使う理由
楽天がHadoopを使う理由
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
 
NTTデータが考えるデータ基盤の次の一手 ~AI活用のために知っておくべき新潮流とは?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
NTTデータが考えるデータ基盤の次の一手 ~AI活用のために知っておくべき新潮流とは?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)NTTデータが考えるデータ基盤の次の一手 ~AI活用のために知っておくべき新潮流とは?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
NTTデータが考えるデータ基盤の次の一手 ~AI活用のために知っておくべき新潮流とは?~(NTTデータ テクノロジーカンファレンス 2020 発表資料)
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
 
JJUG CCC リクルートの Java に対する取り組み
JJUG CCC リクルートの Java に対する取り組みJJUG CCC リクルートの Java に対する取り組み
JJUG CCC リクルートの Java に対する取り組み
 
Yahoo! JAPANを支えるビッグデータプラットフォーム技術
Yahoo! JAPANを支えるビッグデータプラットフォーム技術Yahoo! JAPANを支えるビッグデータプラットフォーム技術
Yahoo! JAPANを支えるビッグデータプラットフォーム技術
 

Similaire à The Data Platform Administration Handling the 100 PB.pdf

Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?MarketingArrowECS_CZ
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data CentersGina Buck
 
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET Journal
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and howbobosenthil
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OSCuneyt Goksu
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Societyconfluent
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
Idc analyst report a new breed of servers for digital transformation
Idc analyst report a new breed of servers for digital transformationIdc analyst report a new breed of servers for digital transformation
Idc analyst report a new breed of servers for digital transformationKaizenlogcom
 
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo
 
Deploying cost effective cloud data center
Deploying cost effective cloud data centerDeploying cost effective cloud data center
Deploying cost effective cloud data centerWiudo Laos
 
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)MarketingArrowECS_CZ
 
Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim
Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim
Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim IJECEIAES
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringIRJET Journal
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET Journal
 

Similaire à The Data Platform Administration Handling the 100 PB.pdf (20)

Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data Centers
 
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
592-1627-1-PB
592-1627-1-PB592-1627-1-PB
592-1627-1-PB
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OS
 
Bigdata-Intro.pptx
Bigdata-Intro.pptxBigdata-Intro.pptx
Bigdata-Intro.pptx
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Idc analyst report a new breed of servers for digital transformation
Idc analyst report a new breed of servers for digital transformationIdc analyst report a new breed of servers for digital transformation
Idc analyst report a new breed of servers for digital transformation
 
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
 
Deploying cost effective cloud data center
Deploying cost effective cloud data centerDeploying cost effective cloud data center
Deploying cost effective cloud data center
 
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
 
Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim
Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim
Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articles
 
Resume (1)
Resume (1)Resume (1)
Resume (1)
 
Resume (1)
Resume (1)Resume (1)
Resume (1)
 

Plus de Rakuten Group, Inc.

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話Rakuten Group, Inc.
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Rakuten Group, Inc.
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technologyRakuten Group, Inc.
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャーRakuten Group, Inc.
 
Unclouding Container Challenges
 Unclouding  Container Challenges Unclouding  Container Challenges
Unclouding Container ChallengesRakuten Group, Inc.
 
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...Rakuten Group, Inc.
 
アジャイル開発とメトリクス
アジャイル開発とメトリクスアジャイル開発とメトリクス
アジャイル開発とメトリクスRakuten Group, Inc.
 
Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2Rakuten Group, Inc.
 
Improve test automation operation
Improve test automation operationImprove test automation operation
Improve test automation operationRakuten Group, Inc.
 

Plus de Rakuten Group, Inc. (12)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 
Unclouding Container Challenges
 Unclouding  Container Challenges Unclouding  Container Challenges
Unclouding Container Challenges
 
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
 
アジャイル開発とメトリクス
アジャイル開発とメトリクスアジャイル開発とメトリクス
アジャイル開発とメトリクス
 
AR/SLAM and IoT
AR/SLAM and IoTAR/SLAM and IoT
AR/SLAM and IoT
 
Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2
 
Improve test automation operation
Improve test automation operationImprove test automation operation
Improve test automation operation
 

Dernier

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxmprakaash5
 
full stack practical assignment msc cs.pdf
full stack practical assignment msc cs.pdffull stack practical assignment msc cs.pdf
full stack practical assignment msc cs.pdfHulkTheDevil
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfROWELL MARQUINA
 
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...BookNet Canada
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023Joshua Flannery
 
Which standard is best for your content?
Which standard is best for your content?Which standard is best for your content?
Which standard is best for your content?Rustici Software
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Women in Automation 2024: Career session - explore career paths in automation
Women in Automation 2024: Career session - explore career paths in automationWomen in Automation 2024: Career session - explore career paths in automation
Women in Automation 2024: Career session - explore career paths in automationDianaGray10
 
The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...stockholm university
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Memoori
 
Tecnogravura, Cylinder Engraving for Rotogravure
Tecnogravura, Cylinder Engraving for RotogravureTecnogravura, Cylinder Engraving for Rotogravure
Tecnogravura, Cylinder Engraving for RotogravureAntonio de Llamas
 

Dernier (20)

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptx
 
full stack practical assignment msc cs.pdf
full stack practical assignment msc cs.pdffull stack practical assignment msc cs.pdf
full stack practical assignment msc cs.pdf
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
 
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
BoSEU24 | Bill Thompson | Talk From Another Century
BoSEU24 | Bill Thompson | Talk From Another CenturyBoSEU24 | Bill Thompson | Talk From Another Century
BoSEU24 | Bill Thompson | Talk From Another Century
 
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
 
Which standard is best for your content?
Which standard is best for your content?Which standard is best for your content?
Which standard is best for your content?
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Women in Automation 2024: Career session - explore career paths in automation
Women in Automation 2024: Career session - explore career paths in automationWomen in Automation 2024: Career session - explore career paths in automation
Women in Automation 2024: Career session - explore career paths in automation
 
The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!
 
Tecnogravura, Cylinder Engraving for Rotogravure
Tecnogravura, Cylinder Engraving for RotogravureTecnogravura, Cylinder Engraving for Rotogravure
Tecnogravura, Cylinder Engraving for Rotogravure
 

The Data Platform Administration Handling the 100 PB.pdf

  • 1. The Data Platform Administration Handling the 100 PB May 19th, 2022 Yongduck Lee Cloud Platform Department Rakuten Group, Inc.
  • 2. 2 About me Lecture History - Colloquium Lecturer at KAIST Program Committee - BigComp2017/2019 - EDB 2016 Certification - Certified Scrum Master (CSM) - Certified Project Management Professional (PMP #1255421) … ETC Lee Yongduck Daniel A Vice Section Manager and Senior Architect at Data Storage and Processing Section in Rakuten Group, Inc. Started as Recommendation Engine Developer and now is focusing on researching and verifying new Big Data Technology and how to support users who want to use Big Data System. B.Sc in Korea University in 2001. 21 years in Japan and have been worked for many organization and company such as NHK, NTTD and Rakuten Group, Inc.
  • 3. 3 CONTENTS 1. Global Internet & Data Explosion 2. Data in Rakuten 3. Data platform & Big Data Administrator in Rakuten 4. What Advantages as Engineer in Rakuten
  • 4. 4 Internet & Globalization The Internet is the global system of interconnected computer networks that use the Internet protocol suite (TCP/IP) to link devices worldwide. It is a network of networks that consists of private, public, academic, business, and government networks of local to global scope, linked by a broad array of electronic, wireless, and optical networking technologies G C Vast Unstructured 80% Structured 20% 35.2 ZB in 2020 The origins of the Internet date back to research commissioned by the federal government of the United States in the 1960s to build robust, fault- tolerant communication with computer networks. https://en.wikipedia.org/wiki/Internet#World_Wide_Web * From IDC white paper & EMC hances Lobalization Information Structure Volume
  • 5. 5 Internet Users Internet users are defined as persons who accessed the Internet in the last 12 months from any device, including mobile phones. https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users#cite_note-UN_WPP-14
  • 6. 6 Internet Users https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users#cite_note-UN_WPP-14 In Japan 92.3% are using Internet ( Population 127,202,192 / Internet Users 117,400,000 ) At 2018
  • 7. 7
  • 8. 8
  • 9. 9 The Big Data in Rakuten There are huge potential value and possibilities due to Diversity of Service and Users not only from Japan but also Global. It is very interesting and ideal environment for Data Scientiest and Data Analyst. Increase synergy effect on personalization, clustering, segmentation, etc. by combining data from various services. The large volume of data every day, every month, and every year from services and users. It is a big challenge to store data and make it easy to utilize for data users as System Infrastructure Engineer and Data Engineer. Diversity and Synergy Scale
  • 10. 10 Rakuten Hadoop and Kafka Supporting near-realtime & streaming processing in each region. Handling data totally around 1.3 Million Message/sec ( 10 GB/sec IN/OUT) around peak time at normal date. At 2021 Super Sale, we handled more than 2.5 times messages and traffics. Supporting Data Lake, Data Mart, and Data Analysis for Rakuten Service in each region. Lots of value mining from big data are being done by data scientist and contributing on Rakuten Service. Kafka: 800 Core, 20TB Mem, 4728 Topics Hadoop : 80K Core, 600 TB Mem, 160K TB Disk
  • 11. 11 The Challenge on Administration
  • 12. 12 The Big Data in Rakuten Platform/Middleware Administrator Users Project/Product Manager Big Data Platform Administrator Infra/Server Administrator Network Administrator Software/System Architect Software Developer
  • 13. 13 Administration Use CASE (HBase) User reported performance issues on HBase but there were no issues or report from other users who are using other component on Hadoop. Confirm Way to get/put data on HBase • HBase Configuration Architecture, Work/Dataflow. Application/GC Logs • Dependency Component (*HDFS) READ/Write Performance Logs Application/GC Logs • DISK/Mem/CPU Load • Kernel Log • Network Connection Date & Time Matching Data Hot Spotting. Data or Configuration Caching HDFS JVM Config change Increasing Handler Increasing Scanner Interval HW Improvement Master Node Replacement Reduced RegionServers Move HDD to NVMe Dedicated RegionServers OS Configuration Root noprocs, nofiles increasing on Dedicated RS HBASE TCPNoDelay, Parallel Seeking , Master Table Locality WRITE/Short-READ/Long-READ Queue DEADLINE Scheduler, Hedged Reads, Short Circuit READ
  • 14. 14 What Advantages in Rakuten as Data Engineer You can go through all necessary domains of Big Data Platform to get rich experience for Big Data Platform Administrators. Rakuten has experts who have rich knowledges and experiences on each technical and management domain.
  • 15. 15 What Advantages in Rakuten as Data Engineer You can also work with various stakeholders from various service domain, from the point of data utilization. DB Services Event INFRA …