Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼

1
Choice of ‘ElasticSearch’ for online e-
commerce big-data analysis based on
high performance and high availability
Bosoon, Kim
CTO (Builton Co., Ltd.)
February 22, 2018
http://www.builton.co.kr/en

2
BuiltOn
• The scale of e-commerce worldwide grows day by day.
• E-commerce, data analysis is essential for companies to choose what to do
and how to do it.
• We analyze various aspects of retailers, sellers and consumers of e-
commerce industry.
• Many companies in South Korea, including global companies, is using
BuiltOn’s data to analyze e-commerce big-data.
• We also collaborate with global data analysis partners.
http://www.builton.co.kr/en
Source: Gray Arial 10pt

3
What is
e-commerce analysis?
What does BuiltOn analyze?

4
Necessity of e-commerce analysis
• Why my product is not selling?
• What is the strategies of selling product of the competitors?
• What is the thoughts of consumers who bought our products or services?
• What is the best selling products?
• What is most effective way of ads to boost sales?
• Who is selling our products?
• How much is sold for our product?
• Where the our products are sold?
• In addition, still there are many questions in e-commerce.
People who work in the e-commerce environment are curious.

5
The e-commerce big-data analysis process diagram
Same flow as typical big-data analytics.
1 2 3 4 5
E-commerce big-data
warehouse configuration
Data collection,
data refining and
data quality control
Configure aggregate
data marts
Visualization Derivation of KPIs
(Ker performance
indicators)

6
Analysis based on digital-shelf.
• Collects the search results of categories
and keywords in target online retailer.
• Analyze the digital shelf share of the
manufacturers and brands.
• Can see the market penetration rate of
my products and competitors.
• Can also see the share of advertising by
manufacturer, brand and product.
• The search results show that consumers
are more likely to choose products that
are exposed to the top.
Brand analysis in TV category digital shelf for target retailer
D2.27%(3)
E2.27%(3)
C2.27%(3)
Etc 7.58% (10)
B Electronics
40.91%
(54)
A Electronics
43.18%
(57)
F
16.67%
(2)
D2.50%(3)
Etc 5.00% (6)
B Electronics
45.00%
(54)
A Electronics
47.50%
(57)
G
16.67%
(2)
H
16.67
(2)
E
25.00%
(3)
C Electronics
25.00%
(3)
A Electronics
B Electronics
D
C Electronics
E
F
Total
(132)
Advertisement
(12)
Normal
(120)
G
H

7
Shoot example: Top5 Shelf Share.

8
The analysis based on price.
• Can analyze the price of products by the seller
and the online retailer according to the time
series.
• For the same product, consumers are more
likely to purchase the lowest-priced product.
• If the prices of goods sold abroad are much
lower, consumers are not willing to buy it in
local.
• The lower of the commodity price, the less
profitable the seller is.
Minimum Advertised Price(MAP) violations by resellers.
CHANEL SUBLIMAGE LA CR. TS
420,000
430,000
440,000
450,000
460,000
470,000
480,000
490,000
500,000
510,000
520,000
530,000
540,000
550,000
560,000
570,000
580,000
Retailer A
Retailer B
Retailer C
Retailer D
Retailer E

9
Shoot Example: Official and Unofficial price trend

10
The analysis based on customer review
• Analyzes the customer’s review of the product.
• Analyzes customer reaction (positive and
negative) of product characteristics through
comments.
• Identify problems of your products and
competitor’s products.
• Discover the sales trend of your products count
by totaling the number of purchases in the
ecommerce websites.
Product review trend
Instant rice 210g x 1
Reviews satisfaction rate

11
Shoot Example: Consumer reviews analytics

12
Analysis based on consumer behavior
• Provide real-time inflow status of online
product page.
• Track consumer behavior of each product.
‒ # of purchase button clicks
‒ # of cart button clicks
‒ Sales success rate
• Provides tracking report that has consist of
analysis platforms, keywords and ads.
0
10
20
30
40
50
60
70
80
90
100
PC Mobile App
100% Stacked chart for platform share based
on time series.

13
Shoot Example: Real-time product page status

14
Retrospective from
2012 to 2016
A bit embarrassed …

15
Starting Architecture
RDBMS
(with Replication)
Nodes (X)
Data Collection Engine
Nodes (X)
Business Server
Web Service
Data-mart
Visualization
Nodes (X)
Network gateway
Nodes (X)
Network controllerRetailer Information
• Product title
• Price and discount
ratio
• Card promotion
• Digital shelfs
• Reviews
• Seller
• Etc…
Batch process
Nodes (X)
Nodes (X)
Text search engine
Based on RDBMS

16
Reason for starting architecture configuration
• Familiar development environment.
‒ C/C++
‒ LUA script engine.
‒ RDBMS on columns such as MySQL, SQL-SERVER, PostgreSQL…
• Execute separate data collection engine instance for each user.
• Cloud platforms such as Amazon web service.
‒ Cloud platform cost is very expensive.
‒ BuiltOn manage own hardware infrastructure to provide efficient architecture service for
partners.
• Self-developed visualization.

17
Develops almost of the architecture component
• Full-text search engine.
‒ Search engine is required to find the products that you want in
big-data.
• Monitoring system.
‒ CPU, Memory, Disk, Network traffic and etc…
• Data replication into storage of customer.
‒ Interpreting and replicating the event log of RDBMS.
‒ Customers want to replicate refined data to their data center.

18
AS THE COMPANY GROWS,
FACING WITH ANOTHER
PROBLEMS.

19
As the company grows…
• Limit point exposure of RDBMS
‒ System slows down.
‒ Difficult to reflect customer customization.
‒ Added columns that other customers do not need.
‒ Too much time waste adding columns.
‒ Increased indexing time.
‒ Frequent replication synchronization issues.
‒ Full-text search tasks a long time.
‒ RDBMS cluster is not very fast even though increase nodes.
• Storage scale-up cost is too expensive.
‒ Initially, HDD
‒ Next, SDD
‒ High-performance NVMe SSD in the end
‒ It’s too expensive
There have been many technical issues.
Storage cost & Maintenance cost
Storage Performance

20
As the company grows…
• Spending too much time for developing
visualization.
• Difficulties on O/S log analysis.
• Long downtime for hardware failures.
• Recurrent development for solving issues.
There have been many technical issues.

22
Why & What happened?
• Excessive desire for development and
testing.
• Enormous stored data.
• The belief that hardware scale-up will
solve everything.
• Lack of understanding on the latest
analytical trend.

23
It’s the economy, stupid
James Carville

24
What should be changed?
• At least the performance has to be much faster than
now.
‒ Without expensive NVMe SSD.
• Schema free for flexible data management.
• Minimize downtime due to hardware equipment
replacement.
• Storage engine that can support full text search
without a separate search engine.
• Automatically, archiving old data in low-cost storage.
Excessive desire for development and testing is wasting of time and money.

26
Own evaluation for existing storage engine
• RDBMS Cluster
‒ As the number of nodes increased, storage capacity was available, but performance was
not satisfactory.
• CouchBase NoSQL database
‒ Random access is good, but the sequential access is bad. The system died as the data
grow up. Now? Changed maybe?
• HDFS
‒ Reliable, high-capacity storage is good. But all the rest must be developed by the
developer.

27
Suddenly, the worst situation happens.
• There was a report that has to be aggregated and processed for 3 minutes
to the analytic report.
• Because many of the input parameters are changed by the user, pre-
calculation is not possible.
• The customer asked us to get the output as soon as they clicked on it.
• It was an unreasonable and excessive demand and could not be processed
in our environment.
One day…

28
ElasticSearch
• Unstable and unreliable storage engine could not be used.
• Meet ElasticSearch while trying to solve these troubles.
• We moved all the data from RDBMS to ElasticSearch, so we provided the
reports within time customer required.
First meet.

29
RDBMS
Based on high
performance
NVMe SSD
420000 IOPS
1 nodes
Response time = x60 faster
ElasticSearch
Based on Normal
SSD
96000 IOPS
2 nodes
3m 3s
180 seconds
response time
3 seconds
response time

30
Amazing performance
With ElasticSearch
BuiltOn

31
New architecture
design in 2017
Based on ElasticSearch

32
New Architecture
Nodes (X)
Job Worker
Node.js
Nodes (X)
Central Scheduler
Node.js
RDBMS data-mart
Visualization based on
Business Intelligence
Nodes (X)
Network gateway
Nodes (X)
Network controller
Retailer Information
Product title, price, card
promotion
Digital shelfs
Shopper reviews
ETL & ELT
Nodes (X)
Elasticsearch
X-pack
Master Nodes (3)
Ingest Nodes (X)
Data Nodes - Hot (X)
Data Nodes - Warm (X)
Nodes (X)
Server
Metricbeat
X-pack
Instances (X)
Refinement
Nodes (X)
Elasticsearch

33
What have we changed?
• Replaced storage engine from RDBMS to
ElasticSearch.
• Perform a full-text search directly from ElasticSearch.
• Changed the system monitoring to Metricbeat.
• Use Hot-Warm nodes without backup old data
separately.
‒ Old data uses based on low-cost hardware such as HDD.
• No longer operate RDBMS data replication.
‒ We trust shard and replication of ElasticSearch.
• If not enough capacity, just add a new node.
‒ ElasticSearch is fast and easy to scale-out.
We’ve changed everything that can be replaced by ElasticSearch.
Metricbeat

34
Changed architecture comparison
Item Old - RDBMS New – ElasticSearch
Data type Based on columns Document
Schema free support N/A YES
Real-time analysis response time Slow High Fast
Downtime Long Almost none
Storage extension policy Scale-up Scale-out
Storage cost Expensive Cheap
SSD type Server side high performance NVMe Server side normal SSD
CPU Xeon E5-2620 v4 2.10GHz / x2 Xeon E5-2620 v4 2.10GHz
Memory 512GB per a node 64GB per a node
Data distribution N/A Shard
Backup Replication Replication
Full-text search In house-development Basic support
Archiving Individual backup into HDD Hot-Warm
System monitoring In house-development Metricbeat
Visualization In house-development Kibana, Tableau or Etc…

35
Before
RDBMS
Expensive CPU /
6 Nodes based on server-side
NVMe SSD /
512GB Memory per a node /
Replication-based backup policies /
Sometimes slow response time
Daily data throughput
After
ElasticSearch
Cheap CPU /
17 Nodes based on Normal server-
side SSD /
64GB Memory per a node /
Multi-shard based cluster /
High fast response time
30GB 500GB

36
Technical Support
• Rapid advanced technical support.
‒ Restart some nodes.
‒ The problem is that the primary shard data is not redistributed.
‒ In the worst case, data loss can occur.
‒ We ask for technical support and were able to solve the problem quickly.
‒ We found that problem turned off the index recovery setting.
‒ We still have technical support if have questions.
X-PACK

37
Future work
Based on ElasticSearch

38
Future work
• Virtualization of ElasticSearch with Docker.
• Infographic using Canvas.
• Buzz analysis using Nori.
• Network monitoring with Packetbeat.
• Monitoring e-commerce big-data properties information using Kibana.
• Logstash will be applied to ETL and ELT.
Can do more with ElasticSearch.

39
Don’t be greedy! 
 
Just use
ElasticSearch! 
 
Thank you.

Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (16)

Similaire à Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼

Similaire à Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼 (20)

Plus de Elasticsearch

Plus de Elasticsearch (20)

Dernier

Dernier (20)

Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼