SlideShare a Scribd company logo
1 of 18
Kafka & Hadoop in Rakuten
Apr 21st, 2021
Yongduck Lee
Cloud Platform. Dept.
Rakuten Group, Inc.
2
What is Apache Kafka?
Apache Kafka is an open-source distributed event streaming platform used by
thousands of companies for high-performance data pipelines, streaming
analytics, data integration, and mission-critical applications.
• Unified platform for handling real-time data feeds
• High-throughput to support high volume event streams
• Graceful dealing with large data backlogs
• Low-latency delivery to handle more traditional messaging
use-cases.
• Fault-tolerance in the presence of machine failures
• Not use in-process cache of the data
https://kafka.apache.org
3
What is Elasticsearch?
Elasticsearch is a distributed, RESTful search and analytics engine capable
of addressing a growing number of use cases. As the heart of the Elastic
Stack, it centrally stores your data for lightning-fast search, fine-tuned
relevancy, and powerful analytics that scale with ease.
https://www.elastic.co/elasticsear
ch/
primary replica
Data Nodes
Master Nodes
ML Nodes
Coordinating Nodes
Transform Nodes
Remote Cluster
Nodes
Cluster A
Cluster B
Cluster C
Client
4
What is Hadoop?
The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers using
simple programming models.
https://hadoop.apache.org
It is designed to scale up from single
servers to thousands of machines, each
offering local computation and storage.
Rather than rely on hardware to deliver
high-availability, the library itself is
designed to detect and handle failures at
the application layer, so delivering a
highly-available service on top of a cluster
of computers, each of which may be
prone to failures.
5
Data Pipeline Concept
Data
Provider
Data Collection Data Wrangling Data Process &
Analysis
Visualization
• Data investigation
• Reporting
• Historical data
• Near time consumers
Realtime data (5-15sec)
• Realtime dashboards
• Traffic anomalies
• Initial research
• Recent data
• Real-Time Collection
• CDC
• Full Dump Data
System /Network
• Application logs
• Access logs
• Transactions logs
• OS Logs/ Network Traffic
User Behaviors
• Purchase
• Page View
• Click
• RQ/LDTime
• Geo Location
• Review
• Product Search
Service
Platform
Event / Product / Profile Info
• Email
• Campaign
• Questionnaires
• Product/Item
• Demography
Product
• Enrichment
• Normalization
• Cleaning
Data
Users
(Actors)
Heterogenous Data
INFRA
RCMD
RANKING
Log
Management
Data
Analysis
….
Near Real-Time Or Batch
- Unstructured data
- Semi-structured data
- Structured data
……
6
Data Pipeline Concept
Sub second / interactive investigation of
data as time series
Complex analytics, data processing, AI
etc over large datasets.
May take from seconds to days to run
depending on workload and processing
framework
7
Kafka in Rakuten
We have been providing Kafka Service from Kafka 0.8 to 2.4 with PLAINTEXT, SASL_PLANTEXT, and
SASL_TLS, Handling around 1.3 Million Message/sec ( 10 GB/sec IN/OUT) around peak time at normal date.
At 2021 Super Sale, we handled more than 2.5 times messages and traffics.
62 Kafka Clusters (7440 Core, 21TB Mem, 4972
Topics)
5th/Mar/2021
22:45 PM
77 Kafka Clusters (7904 Core, 22TB Mem, 5091
Topics)
08/Apr/2021
7.440 K
8
Kafka in Rakuten
NA EU JP
69
4
4
Near Real-Time One-way Mirroring
Cross-DC Active/Active | Active/Hot Standby Kafka
using MirrorMaker2 + KafkaConnect
9
Elasticsearch in Rakuten
We have been providing ES Service from 2.X to 7.X with Basic & Commercial Subscriptions, indexing
hundreds of thousands doc/sec for near-real time log management & monitoring and user behavior & KPI
analysis. At 2021 Super Sale, we handled more than 2 times docs and traffics.
47 ES Clusters (5960 Core, 6.4TB Mem, 71TB
Indices)
10
Hadoop In Rakuten
Vcore Mem Disk
72K 442TB 130 PB
RAM
Nodes
1K
08/Apr/2021
We are providing HDP2 & HDP3 Clusters in JP/EU/US regions. Our use case is very aggressive multi-tenants
who are using as data lake/data analysis/backup & archiving, etc. All CPU-intensive, Memory-intensive,
Disk-intensive use-case are running on clusters at the same time but we are providing high stability and
performance service with rich experiences on Hadoop administration from the 1st generation of Apache
Hadoop.
11
Hadoop in Rakuten
12
Challenge on Kafka
Mirroring Throughput between Region or Zone
- Temporary network failure.
- High Latency
- Location of MirrorMaker Pros & Cons
Instability or cluster broken
- High Load during Rebalancing or Recovery.
- Rack-awareness
- Major/Minor Upgrade or Patching
JDK & Cross-Realm Issue
- Consuming & Producing between Cluster
with different Realm or Service Name
- JDK Specification about Kerberos Authentication
OOM on Brokers or Zookeeper
- Many Consumer or Producer
- Large size of message
- Z-node creation
- Increase # of partitions
- Relocate Mirror Maker on Source Side and increase
Producer Parallelism
Parallelism
- Reduce size of data which will be replicated during
recovery or rebalancing by small servers with proper
size of DISK, CPU, and Mem for java/scalar
Scale-Out than Scale-Up
- Use Streaming Framework (Spark, Flink, and so on)
- Use Middleware which are supporting different
service name and Version.(NiFi)
- Use Global KDC and one Realm for Kafka Clusters
Global KDC and Proper Streaming Solution
- Guide users by proactive consultation as
professional.
- Authorization on ZK nodes
Confirm Use-Case and Dedicated ZK
13
Challenge on Elasticsearch
Mixed Indexing Query Pattern
• Doc/sec (100K doc/sec ~ 1K doc/sec)
• Size per index (1TB/hour~1GB/hour)
• Short- or Long-term query
Unbalanced Shard distribution
• total # of shard per nodes
• balance of high or low loads of shards per nodes
Too many Indices and shards
• long retention
• Many shards on index for load distribution.
Arbitrary Docs indexing on ES
• Arbitrary # of Json Field.
• Invalid data which are not matched with Data Type
• Too many Json Field in doc.
Fast Query in the middle of High load of indexing.
OOM on Data Nodes and Coordinating Nodes.
Hard to scale out only for High load index.
……
14
Challenge on Elasticsearch
Hot
Cold
Data Nodes
Master Nodes
Coordinating Nodes
Client
Coordinating Nodes
Hot
Cold
Hot
Cold HL
Group
ML
Group
LL
Group
Routing
SEH
Template
IDX
Move/Merge/READ-ONLY
15
Challenge on Elasticsearch
Hot
Cold
Data Nodes
Master Nodes
Coordinating Nodes
Client
Coordinating Nodes
Hot
Cold
Hot
Cold HL
Group
ML
Group
LL
Group
SEH
Template
IDX
Move/READ-ONLY
16
Challenge on Hadoop
Aggressive Multi-tenant on Big box of Cluster
- Job Pending or Execution Delay
- NameNode Slowdown
- Zookeeper Timeout
- NameNode Heap
- Localization Issues
- Large # of Files
High Performance & Low Cost
- CPU-Intensive
- Memory Intensive
- Disk-Intensive
Preemption
Federation
Zookeeper Separation
Continuous
balancing
Dedicated Node
with Labeling
Heterogenous
Proper Node Design
Based on Needs
Utilizing SSD & HDD
On-Premise
Training Course
NameNode RPC QoS
17
Future Challenges
Self-Service
• Self-Operation
• Data Profiling &
Governance
• Broker Level Administration
• Active-Active Mirroring
Next Generation
• Kafka vs ???
• Elasticsearch vs ???
Return To Apache Hadoop
• HDP Subscription Policy
• Ambari to Chef or Ansible
• Rakuten Distribution
Hadoop
Containerization
• Service Discovery
• Persistent Storage or Local
Storage
• Physical vs Logical
Separation
Kafka & Hadoop in Rakuten

More Related Content

What's hot

Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
モニタリングプラットフォーム開発の裏側
モニタリングプラットフォーム開発の裏側モニタリングプラットフォーム開発の裏側
モニタリングプラットフォーム開発の裏側Rakuten Group, Inc.
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
 
大規模データ処理の定番OSS Hadoop / Spark 最新動向 - 2021秋 -(db tech showcase 2021 / ONLINE 発...
大規模データ処理の定番OSS Hadoop / Spark 最新動向 - 2021秋 -(db tech showcase 2021 / ONLINE 発...大規模データ処理の定番OSS Hadoop / Spark 最新動向 - 2021秋 -(db tech showcase 2021 / ONLINE 発...
大規模データ処理の定番OSS Hadoop / Spark 最新動向 - 2021秋 -(db tech showcase 2021 / ONLINE 発...NTT DATA Technology & Innovation
 
Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2Rakuten Group, Inc.
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsNilesh Gule
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern ApplicationsAmazon Web Services
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioDevOpsDays Tel Aviv
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
 
The Case for Chaos
The Case for ChaosThe Case for Chaos
The Case for ChaosBruce Wong
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksGuido Schmutz
 
詳説探究!Cloud Native Databaseの現在地点(CloudNative Days Tokyo 2023 発表資料)
詳説探究!Cloud Native Databaseの現在地点(CloudNative Days Tokyo 2023 発表資料)詳説探究!Cloud Native Databaseの現在地点(CloudNative Days Tokyo 2023 発表資料)
詳説探究!Cloud Native Databaseの現在地点(CloudNative Days Tokyo 2023 発表資料)NTT DATA Technology & Innovation
 
Observability at Scale
Observability at Scale Observability at Scale
Observability at Scale Knoldus Inc.
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersJean-Paul Azar
 

What's hot (20)

Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
モニタリングプラットフォーム開発の裏側
モニタリングプラットフォーム開発の裏側モニタリングプラットフォーム開発の裏側
モニタリングプラットフォーム開発の裏側
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
 
大規模データ処理の定番OSS Hadoop / Spark 最新動向 - 2021秋 -(db tech showcase 2021 / ONLINE 発...
大規模データ処理の定番OSS Hadoop / Spark 最新動向 - 2021秋 -(db tech showcase 2021 / ONLINE 発...大規模データ処理の定番OSS Hadoop / Spark 最新動向 - 2021秋 -(db tech showcase 2021 / ONLINE 発...
大規模データ処理の定番OSS Hadoop / Spark 最新動向 - 2021秋 -(db tech showcase 2021 / ONLINE 発...
 
Observability
ObservabilityObservability
Observability
 
Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss tools
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern Applications
 
Spark SQL - The internal -
Spark SQL - The internal -Spark SQL - The internal -
Spark SQL - The internal -
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
 
Elastic-Engineering
Elastic-EngineeringElastic-Engineering
Elastic-Engineering
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
The Case for Chaos
The Case for ChaosThe Case for Chaos
The Case for Chaos
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
詳説探究!Cloud Native Databaseの現在地点(CloudNative Days Tokyo 2023 発表資料)
詳説探究!Cloud Native Databaseの現在地点(CloudNative Days Tokyo 2023 発表資料)詳説探究!Cloud Native Databaseの現在地点(CloudNative Days Tokyo 2023 発表資料)
詳説探究!Cloud Native Databaseの現在地点(CloudNative Days Tokyo 2023 発表資料)
 
KafkaとPulsar
KafkaとPulsarKafkaとPulsar
KafkaとPulsar
 
Observability at Scale
Observability at Scale Observability at Scale
Observability at Scale
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer Consumers
 

Similar to Kafka & Hadoop in Rakuten

Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networkspbelko82
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeongYousun Jeong
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsYousun Jeong
 
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Anthony Baker
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...DataStax
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark Summit
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john malloryAmazon Web Services
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problemsAbhishek Gupta
 

Similar to Kafka & Hadoop in Rakuten (20)

Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networks
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CIT
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john mallory
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 

More from Rakuten Group, Inc.

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話Rakuten Group, Inc.
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のりRakuten Group, Inc.
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Rakuten Group, Inc.
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みRakuten Group, Inc.
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開Rakuten Group, Inc.
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用Rakuten Group, Inc.
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfRakuten Group, Inc.
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technologyRakuten Group, Inc.
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情Rakuten Group, Inc.
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャーRakuten Group, Inc.
 
Unclouding Container Challenges
 Unclouding  Container Challenges Unclouding  Container Challenges
Unclouding Container ChallengesRakuten Group, Inc.
 
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...Rakuten Group, Inc.
 
アジャイル開発とメトリクス
アジャイル開発とメトリクスアジャイル開発とメトリクス
アジャイル開発とメトリクスRakuten Group, Inc.
 
Improve test automation operation
Improve test automation operationImprove test automation operation
Improve test automation operationRakuten Group, Inc.
 

More from Rakuten Group, Inc. (19)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 
Unclouding Container Challenges
 Unclouding  Container Challenges Unclouding  Container Challenges
Unclouding Container Challenges
 
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
 
アジャイル開発とメトリクス
アジャイル開発とメトリクスアジャイル開発とメトリクス
アジャイル開発とメトリクス
 
AR/SLAM and IoT
AR/SLAM and IoTAR/SLAM and IoT
AR/SLAM and IoT
 
Improve test automation operation
Improve test automation operationImprove test automation operation
Improve test automation operation
 

Recently uploaded

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Recently uploaded (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Kafka & Hadoop in Rakuten

  • 1. Kafka & Hadoop in Rakuten Apr 21st, 2021 Yongduck Lee Cloud Platform. Dept. Rakuten Group, Inc.
  • 2. 2 What is Apache Kafka? Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. • Unified platform for handling real-time data feeds • High-throughput to support high volume event streams • Graceful dealing with large data backlogs • Low-latency delivery to handle more traditional messaging use-cases. • Fault-tolerance in the presence of machine failures • Not use in-process cache of the data https://kafka.apache.org
  • 3. 3 What is Elasticsearch? Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data for lightning-fast search, fine-tuned relevancy, and powerful analytics that scale with ease. https://www.elastic.co/elasticsear ch/ primary replica Data Nodes Master Nodes ML Nodes Coordinating Nodes Transform Nodes Remote Cluster Nodes Cluster A Cluster B Cluster C Client
  • 4. 4 What is Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. https://hadoop.apache.org It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
  • 5. 5 Data Pipeline Concept Data Provider Data Collection Data Wrangling Data Process & Analysis Visualization • Data investigation • Reporting • Historical data • Near time consumers Realtime data (5-15sec) • Realtime dashboards • Traffic anomalies • Initial research • Recent data • Real-Time Collection • CDC • Full Dump Data System /Network • Application logs • Access logs • Transactions logs • OS Logs/ Network Traffic User Behaviors • Purchase • Page View • Click • RQ/LDTime • Geo Location • Review • Product Search Service Platform Event / Product / Profile Info • Email • Campaign • Questionnaires • Product/Item • Demography Product • Enrichment • Normalization • Cleaning Data Users (Actors) Heterogenous Data INFRA RCMD RANKING Log Management Data Analysis …. Near Real-Time Or Batch - Unstructured data - Semi-structured data - Structured data ……
  • 6. 6 Data Pipeline Concept Sub second / interactive investigation of data as time series Complex analytics, data processing, AI etc over large datasets. May take from seconds to days to run depending on workload and processing framework
  • 7. 7 Kafka in Rakuten We have been providing Kafka Service from Kafka 0.8 to 2.4 with PLAINTEXT, SASL_PLANTEXT, and SASL_TLS, Handling around 1.3 Million Message/sec ( 10 GB/sec IN/OUT) around peak time at normal date. At 2021 Super Sale, we handled more than 2.5 times messages and traffics. 62 Kafka Clusters (7440 Core, 21TB Mem, 4972 Topics) 5th/Mar/2021 22:45 PM 77 Kafka Clusters (7904 Core, 22TB Mem, 5091 Topics) 08/Apr/2021 7.440 K
  • 8. 8 Kafka in Rakuten NA EU JP 69 4 4 Near Real-Time One-way Mirroring Cross-DC Active/Active | Active/Hot Standby Kafka using MirrorMaker2 + KafkaConnect
  • 9. 9 Elasticsearch in Rakuten We have been providing ES Service from 2.X to 7.X with Basic & Commercial Subscriptions, indexing hundreds of thousands doc/sec for near-real time log management & monitoring and user behavior & KPI analysis. At 2021 Super Sale, we handled more than 2 times docs and traffics. 47 ES Clusters (5960 Core, 6.4TB Mem, 71TB Indices)
  • 10. 10 Hadoop In Rakuten Vcore Mem Disk 72K 442TB 130 PB RAM Nodes 1K 08/Apr/2021 We are providing HDP2 & HDP3 Clusters in JP/EU/US regions. Our use case is very aggressive multi-tenants who are using as data lake/data analysis/backup & archiving, etc. All CPU-intensive, Memory-intensive, Disk-intensive use-case are running on clusters at the same time but we are providing high stability and performance service with rich experiences on Hadoop administration from the 1st generation of Apache Hadoop.
  • 12. 12 Challenge on Kafka Mirroring Throughput between Region or Zone - Temporary network failure. - High Latency - Location of MirrorMaker Pros & Cons Instability or cluster broken - High Load during Rebalancing or Recovery. - Rack-awareness - Major/Minor Upgrade or Patching JDK & Cross-Realm Issue - Consuming & Producing between Cluster with different Realm or Service Name - JDK Specification about Kerberos Authentication OOM on Brokers or Zookeeper - Many Consumer or Producer - Large size of message - Z-node creation - Increase # of partitions - Relocate Mirror Maker on Source Side and increase Producer Parallelism Parallelism - Reduce size of data which will be replicated during recovery or rebalancing by small servers with proper size of DISK, CPU, and Mem for java/scalar Scale-Out than Scale-Up - Use Streaming Framework (Spark, Flink, and so on) - Use Middleware which are supporting different service name and Version.(NiFi) - Use Global KDC and one Realm for Kafka Clusters Global KDC and Proper Streaming Solution - Guide users by proactive consultation as professional. - Authorization on ZK nodes Confirm Use-Case and Dedicated ZK
  • 13. 13 Challenge on Elasticsearch Mixed Indexing Query Pattern • Doc/sec (100K doc/sec ~ 1K doc/sec) • Size per index (1TB/hour~1GB/hour) • Short- or Long-term query Unbalanced Shard distribution • total # of shard per nodes • balance of high or low loads of shards per nodes Too many Indices and shards • long retention • Many shards on index for load distribution. Arbitrary Docs indexing on ES • Arbitrary # of Json Field. • Invalid data which are not matched with Data Type • Too many Json Field in doc. Fast Query in the middle of High load of indexing. OOM on Data Nodes and Coordinating Nodes. Hard to scale out only for High load index. ……
  • 14. 14 Challenge on Elasticsearch Hot Cold Data Nodes Master Nodes Coordinating Nodes Client Coordinating Nodes Hot Cold Hot Cold HL Group ML Group LL Group Routing SEH Template IDX Move/Merge/READ-ONLY
  • 15. 15 Challenge on Elasticsearch Hot Cold Data Nodes Master Nodes Coordinating Nodes Client Coordinating Nodes Hot Cold Hot Cold HL Group ML Group LL Group SEH Template IDX Move/READ-ONLY
  • 16. 16 Challenge on Hadoop Aggressive Multi-tenant on Big box of Cluster - Job Pending or Execution Delay - NameNode Slowdown - Zookeeper Timeout - NameNode Heap - Localization Issues - Large # of Files High Performance & Low Cost - CPU-Intensive - Memory Intensive - Disk-Intensive Preemption Federation Zookeeper Separation Continuous balancing Dedicated Node with Labeling Heterogenous Proper Node Design Based on Needs Utilizing SSD & HDD On-Premise Training Course NameNode RPC QoS
  • 17. 17 Future Challenges Self-Service • Self-Operation • Data Profiling & Governance • Broker Level Administration • Active-Active Mirroring Next Generation • Kafka vs ??? • Elasticsearch vs ??? Return To Apache Hadoop • HDP Subscription Policy • Ambari to Chef or Ansible • Rakuten Distribution Hadoop Containerization • Service Discovery • Persistent Storage or Local Storage • Physical vs Logical Separation