Strata Singapore 2017 business use case section
"Big Telco Real-Time Network Analytics"
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/62797
2. Who am I?
§ Senior Software Engineer of SK Telecom, South Korea’s largest wireless communications
provider
§ Work on commercial products (~ ’17)
- She worked with Big Data Solution
- She worked with IaaS(OpenStack)
- She worked with PaaS(CloudFoundry)
§ Mail to : jerryjung@apache.org
22
3. Table of Contents
§ Big Data in SK Telecom
§ History of SKT's big data
§ Overall Architecture
§ Use case: Real-Time Network Analytics
3
4. Big Data in SKT in a Nutshell
§ Data Size
- Currently collecting 100 TB/day
§ Big Data Management Infrastructure
- Hadoop cluster (1400+ nodes); migrated from MPP RDBMS
§ Overall Architecture
- Spark
- Druid
§ Real-Time Network Analytics
- Real-Time Processing
- Hadoop DW
- Big Data Discovery
4
6. History of SKT’s Big Data
6
§ Batch Processing(Daily)
§ Map-Reduce Programming
§ Hadoop HDFS
2013
§ Batch Processing(Hourly, Daily)
§ SQL on Hadoop
§ Hive(UDF, UDAF)
2014
§ Real-time Processing (Near real-time)
§ Hadoop DW
§ Spark(Streaming, SQL)
2015
§ Big Data OLAP cube
§ Self Data Discovery
§ Druid
Now
7. Overall Architecture
§ Designed to handle both real-time & batch data processing and high level analysis using
Spark and Druid as a core technology
7
BatchInterface Layer
Flume
Kafka HDFS
oozie (workflow)
Spark
(ETL)
Analytics
Layer
1
2
Spark SQL
Spark MlLib
Jupyter(R,Python)
Kubernetes
YARN (Unified Resource Manager)
Real-Time
Layer
NoSQL
Elastic
Search
HDFS
Data Service
Layer
Legacy
App
3
Analytics Layer
Batch Processing Layer
Hadoop EDW
Real-Time Layer
Real-Time analysis
3
1
2
【 Components 】
Spark Streaming
H/W Accelerator
(SSD, FPGA)
Provisioning
PXEBoot/chef
4
5
Druid
(Mart)
Metatron(BI)
8. Benefits of Spark
§ Spark help us to have the gains in processing speed and implement various big data
applications easily and speedily
§ Why SKT use Spark…
- Support for Event Stream Processing
- Fast Data Queries in Real Time
- Improved Programmer Productivity
- Fast Batch Processing of Large Data Set
8
10. Benefits of Druid
§ Druid is a distributed in-memory OLAP data store. It has features of timestamp-based
sharding, columnar index & compression, and pre-aggregation on the metric
§ Why SKT use Druid…
- Sub-second processing capability
- Stores aggregated summary data
for time-series data
- Separated processing engine
(Real-time and historical engine)
support analytics at the same time
10
Deep
Storage
(HDFS/S3)
Realtime
Nodes
Hand off Data
Historical
Nodes
Broker
Coordinator
MetaData
Streaming Data
Batch Data
Indexing
Data segments
Queries
Queries
11. Druid vs Spark Performance Comparison
§ Druid and Spark have different results depending on the nature of the engine.
§ Druid vs Spark
- Druid converts data into OLAP
optimized pre-aggregated, indexed,
columnar structures
- Druid has separate ingestion overhead
- Excellent in terms of memory and
disk I/O compared to Spark
- Spark is able to process all TPC-H queries
11
https://github.com/jaehc/tpch-spark/tree/feature-run-multiple-queries
http://druid.io/blog/2014/03/17/benchmarking-druid.html
12. Druid vs Spark Performance Comparison
§ SUM_ALL_YEAR
- SELECT YEAR(L_SHIPDATE),
SUM(L_EXTENDEDPRICE),
SUM(L_DISCOUNT),SUM(L_TAX), SUM(L_QUANTITY)
FROM LINEITEM GROUP BY YEAR(L_SHIPDATE)
§ TOP_100_PARTS_DETAILS
- SELECT L_PARTKEY, SUM(L_QUANTITY),
SUM(L_EXTENDEDPRICE),MIN(L_DISCOUNT),
MAX(L_DISCOUNT) FROM LINEITEM GROUP BY
L_PARTKEY ORDER BY SUM(L_QUANTITY) DESC
LIMIT 100
12
13. Use cases : Summary
13
TANGO-D
APOLLO
• TANGO(T Advanced Next Generation OSS)-D(Data warehouse)
• End-to-end network quality assurance and fault analysis in a
timely manner
• APOLLO(Analytics PlatfOrm for inteLLigent Operation)
• Real-time analysis of radio access network to improve
operation efficiency
Real-Time Network analytics
1
2
Metatron
Discovery
3
• Metatron(Development by SKT big data discovery & analytics
solution)
• Interactive Analysis for network engineer & operator & data
scientist
14. Use Case 1: Apollo Real-Time Analytics
§ APOLLO aims to improve mobile user experience, reduce operation cost, and improve
operation efficiency by analyzing radio access networks
14
Analytics Output
Root
Cause
Finding
Anomaly
Detection
Optimization
Resource
Monitoring
Call Data
RF Signal
Customer/Service
Device Data
A/F/S
Real-Time Analytics
Platform
Data
Collecting
Analytics based
Control
OAM
Operator
Predictive
Analysis
Service
Analysis
Real-time
Monitoring &
OptimizationEngineering
Optimization
NetworkIntelligence
KPI
Detection
* APOLLO : Analytics PlatfOrm for inteLLigent Operation
15. Use Case 1: Apollo Real-Time Analytics
§ APOLLO collects and analyzes raw data from base stations in real time to optimize the
service performance
§ Spark Streaming
- Processes raw data to obtain statistics
every 10 seconds
- Automatically detects abnormality
§ Real-Time User/Service Level Optimization
- Predict traffic variation and base
station performance
- Minimize degradation in base
station and user performance
15
Base Station
Storage
Spark
Dashboard
Spark Streaming
Data
Parsing
Real time
Processing
Kafka
Data
Converting
RDD
Elastic
Search
[ Real-Time Analytics]
16. Use case 2: TANGO-D
§ TANGO-D is a Hadoop DW that can handle big telco data with scalability & cost efficiency
16
“Hadoop S/W and Commodity H/W
Based Cost-effective IT Infrastructure System”
【 SKT DW Infrastructure】
“High-price, High-performance
Proprietary IT Infrastructure System”
【 Legacy IT Infrastructure 】
※ MPP Massively Parallel Processing, SAN Storage Area Network, NAS Network Attached Storage, RDBMS Relational DB Management System
Structured/Un-structured Data
Scale-out Structure (Petabyte, Exabyte)
Data
Structured Data
Scale-up Structure (Terabyte)
Commodity H/W (x86 Server)H/W
High Performance H/W
(MPP, Fabric Switch, etc.)
Hadoop Architecture
SQL on Hadoop
S/W
Proprietary S/W
(RDBMS, etc.)
Transaction/Batch
Processing
(SQL) Hadoop File System
※ MPP Massively Parallel Processing
17. Use case 2: TANGO-D
§ Data scientists need unified platform to collect data from all network equipment for
management and analysis purpose
§ Expected advantages
- Unification of 130+ legacy DMBSs, each of which was managing separate network monitoring system,
enabling thorough analysis over the entire network
- Quick and accurate identification of root causes of network failure
17
NMS#1
DBMS
…
NMS#1
DBMS
NMS#N-1
DBMS
[ AS-WAS ]
Siloed Data & IT Management
Access NW Core NW Transport
NMS
#1
…
NMS
#2
NMS
#N-1
Legacy
NMS
#N
Hadoop DW
DW
Legacy
NEW
NMS#1
…
NEW
NMS#N
BI &
Analytic…
[ AS-IS ]
Network Enterprise DW
18. Use case 2: TANGO-D
§ TANGO-D is a Hadoop-based data warehouse built on Spark for various network statistics
or raw data
§ User Benefits
- End-to-End quality assurance,
Fault analysis
- Reduces analysis lead time
(days → minutes)
- Saves TCO (1/5 less than legacy DW)
§ Hadoop DW
- Spark-SQL functions and query optimizer
- Bulk-loading and timely processing
of large data
(processing 2,500 table per hour)
18
Acess
Core
Transport
EMS
EMS
T-Pani
EMS
Hadoop DW
DW Data
Data Mart
SQL on
Hadoop
(Spark SQL)
IP
EMS
AnalyticsSQL
ETL
ETL
O
D
S
MQE
(Meta Query
Engine)
BI
19. Use case 3: Metatron Discovery
§ We developed the Metatron Discovery solution for quick and easy data analysis and we
applied it in-house big data system
19
Analysis & Analytics tools
(Jupyter, Prediction, Clustering)
Application
(Visualization,
Data Preparation, Workbench)
Big Data
Storage
File system
Key FeaturesArchitecture
It easy to analyze big data with end-to-end
functionality from data preparation to
analysis charts.
Intuitive Analysis
Minimize ETL cost, speed up, and
support schema changes by creating a
single Big Mart by combining various
dimension data based on large-capacity
Big OLAP Cube
By transferring data to In-memory, Local
Storage, and Deep Storage over time, it is
possible to respond quickly to large-
capacity data over TB.
Sub-second Processing
Advanced Analytics
Provides analysis function in conjunction with
jupyter, Provides fast time series forecasting,
clustering with embedded analytics.
Data Processing Engine
(OLAP Engine)
Complex to analysis
separated various SWs
needed for each step of data
discovery
Too slow for big data
not support real-time
analysis
Lack of analytics functions
and visualization charts
for telecom analysis
Challenges
20. Use case 3: Metatron Discovery
§ Metatron Discovery enables E2E analysis to perform on a unified analytics platform
§ User Benefits
- Operational BI using
network engineer and operator
- Work with Jupyter to perform
Advanced Analysis
- Drill-Down search
by Drag and Drop interface easily
20
Executive
Officer
Network
Operator
Field
Engineer Biz. Partner
TANGO-D
Access
Transport
Core/ICT
Planning and
Investment
Strategy
Engineering Construction Operation
Work & TT
Management
Network
Monitoring
N/W Data Repository Analytics PlatformE2E Inventory
Operational BI
Advanced
Analytics
Data Discovery
21. Use case 3: Metatron Discovery
§ Metatron's core engine is that Druid can query quickly by time granularity using a cache
21
Historical
Nodes
Broker
Zookeeper
Coordinator
Nodes
Druid Cluster
HDFS
metastore
Oozie
Hadoop Cluster(DW)
HDFS(Deep Storage)
Segment
Memory
Segment
Disk
Cache
Entries
Segment
Metadata
Data/segment
Queries
Querying
2017-01-03 ~
2017-01-08
Cache (Broker Nodes)
Result segment 2017-01-03/2017-01-04
Result segment 2017-01-07/2017-01-08
Querying
(Not in Cache)
Historical Node
Segment 2017-01-04/2017-01-05
Segment 2017-01-07/2017-01-08
Druid
Query Process
TANGO-D (Hadoop DW)
1
3
4
2
22. Use case 3: Metatron Discovery
§ Metatron Discovery composes to 3 Parts (Workspace, Workbench, Jupyter). Each user can
experience various analysis environments.
§ Workspace
- General Network Engineer
& Operator
§ Workbench
- Advanced Analyst
§ Jupyter
- Statistical Analyst
22
Direct Query
TANGO-D(Hadoop DW Cluster)
Oozie
Spark
SQL
Thrift Server
Yarn
SparkSQL
HDFS
Druid Cluster
Deep Storage
Historical Nodes Real-Time Nodes
Broker
Nodes
Zookeeper
Coordinator
Nodes
Workbench
Workspace
Data
Analytics
(SQL)
특수지역 동기화
(Sqoop)
Fixed Report Dynamic Report
DW/Mart Data Batch
Data
Analytics
Ad-hoc
Jupyter
R/Python
Metatron Discovery
Direct
Query
1
2 3
23. Containerized Environment of Analytics(Ongoing)
§ The analysis environment can deploy as a docker, configured for individual analysis
environments, and managed container resources as needed using by Kubernetes,
GlusterFS
23
K8S Master K8S Master K8S Node#1 K8S Node#N K8S Node#N
Nginx
GlusterFS GlusterFS GlusterFS
private shared
[Container]
[Provisioning]
Admin
User
Docker
Registry
24. Self-Data Preparation(Ongoing)
§ Data preparation makes it easy for anyone to do tedious and repetitive ETL tasks that
preprocessing for visualizing and analyzing data
24
25. Self-Data Analytics(Ongoing)
§ Data analysts can interact with Metatron Discovery to run analytics and create Rest API
directly from jupyter
25
1
2
3
4
26. Metatron
§ If you have any questions, please visit here - https://metatron.sktelecom.com/
26