Timeline service V2 at the Hadoop Summit SJ 2016

(Big Data)2
How YARN Timeline Service v.2 Unlocks 360-Degree
Pla@orm Insights at Scale
Sangjin Lee @sjlee (Twi5er)
Li Lu (Hortonworks)
Vrushali Channapa5an @vrushalivc (Twi5er)

Outline
• Why v.2?
• Highlights
• Developing for Timeline Service v.2
• SeIng up Timeline Service v.2
• Milestones
• Demo

Why v.2?
• YARN Timeline Service v 1.x
• Gained good adopSon: Tez, HIVE, Pig, etc.
• Keeps improving with v 1.5 APIs and storage implementaSon
• SSll facing some fundamental challenges...

Why v.2?
• Scalability and reliability challenges
• Single instance of Timeline Server
• Storage (single local LevelDB instance)
• Usability
• Flow
• Metrics and conﬁguraSon as ﬁrst-class ciSzens
• Metrics aggregaSon up the enSty hierarchy

Highlights
v.1 v.2
Single writer/reader Timeline Server Distributed writer/collector architecture
Single local LevelDB storage* Scalable storage (HBase)
v.1 enSty model New v.2 enSty model
No aggregaSon Metrics aggregaSon
REST API Richer query REST API

Architecture
• SeparaSon of writers (“collectors”) and readers
• Distributed collectors: one collector for each app
• Dedicated RM collector for RM-generated data
• Collector discovery via RM
• Pluggable storage with HBase as default storage

Distributed collectors & readers
!meline
reader
!meline
reader
Storage
!meline
reader
AM
!meline
collector
NM
!meline reader pool
app metrics/events
container events/metrics
RM
!meline collector
app/container events
user queries
(worker node running AM)
(worker node running containers)
write ﬂow
read ﬂow

Collector discovery
RM
AM
app id => address
! start AM container
NM
3meline
collector
" node heartbeat
# allocate response
worker node
3meline
client

New enSty model
• Flows and ﬂow runs as parents of YARN applicaSon enSSes
• First-class conﬁguraSon (key-value pairs)
• First-class metrics (single-value or Sme series)
• Designed to handle mulS-cluster environment out of the box

What is a ﬂow?
• A ﬂow is a group of YARN
applicaSons that are launched as
parts of a logical app
• Oozie, Scalding, Pig, etc.
• name:
“frequent_visitor_stat”
• run id: 1466097809000
• version: “b9b9068”

ConfiguraSon and metrics
• Now explicit top-level a5ributes of
enSSes
• Fine-grained updates and queries
made possible
• “update metric A to value x”
• “query enMMes where config A = B”
container 1_1
metric: A = 10
metric: B = 100
config: "Foo" = "bar"

ConfiguraSon and metrics
• Now explicit top-level a5ributes of
enSSes
• Fine-grained updates and queries
made possible
• “update metric A to value x”
• “query enMMes where config A = B”
container 1_1
metric: A = 50
metric: B = 100
config: "Foo" = "bar"

HBase Storage
• Scalable backend
• Row Key structure
• efficient range scans
• KeyPrefixRegionSplitPolicy
• Filter pushdown
• Coprocessors for flow aggregaSon (“readless” aggregaSon)
• Cell tags for metadata (applicaSon id, aggregaSon operaSon)
• Cell Smestamps generated during put
• lei shiied with app id added to avoid overwrites

Tables in HBase
• flow run
• application
• entity
• flow activity
• app to flow

table: flow run
Row key:
clusterId!userName!
flowName!
inverted(flowRunId)
most recent flow run stored first
coprocessor enabled

table: applicaSon
Row key:
clusterId!userName!
flowName!
inverted(flowRunId)!
AppId
applicaSons within a flow run stored
together
most recent flow run stored first

table: enSty
Row key:
userName!clusterId!flowName!
inverted(flowRunId)!AppId!entityType!
entityId
enSSes within an applicaSon within a ﬂow run stored together per
type
• for example, all containers within a yarn applicaSon will be
stored together
pre-split table
stores information per entity run like info, relatesTo, relatedTo,
events, metrics, conﬁg

table: flow acSvity
Row key:
clusterId!
inverted(TopOfTheDay)!
userName!flowName
shows the flows that ran on that day
stores informaSon per flow like number of
runs, the run ids, versions

table: appToFlow
Row key:
clusterId!appId
- stores mapping of appId to
flowName and flowRunId

Metrics aggregaSon
• ApplicaSon level
• Rolls up sub-applicaSon metrics
• Performed in real Sme in the collectors in memory
• Flow run level
• Rolls up app level metrics
• Performed in HBase region servers via coprocessors
• Offline aggregaSon (TBD)
• Rolls up on user, queue, and flow offline periodically
• Phoenix tables
Container 1_1
“bytes” : 23
Container 1_2
“bytes” : 135
Container 2_1
“bytes” : 50
Container 3_1
“bytes” : 64
App1
“bytes”: 158
App2
“bytes”: 50
App3
“bytes”: 64
flow1
“bytes”: 208
flow2
“bytes”: 64
user1
“bytes”: 272
queue1
“bytes”: 272
App
aggregation
In collector
flow
aggregation
In hbase
offline
aggregation

FlowRun
Aggrega:on
via the HBase
Coprocessor
App
Metrics
Cells
in
HBase
FlowRun
Metric
Sum

App
Metrics
Cells
in
HBase
FlowRun
Metric
Sum
FlowRun
Aggrega:on
via the HBase
Coprocessor

Reader REST API: paths
• URLs under /ws/v2/Smeline
• Canonical REST style URLs: /ws/v2/Smeline/clusters/cluster_name/
users/user_name/flows/flow_name/runs/run_id
• Path elements may be omi5ed if they can be inferred
• flow context can be inferred by app id
• default cluster is assumed if cluster is omi5ed

Reader REST API: query params
• limit, createdTimeStart, createdTimeEnd: constrain the enSSes
• ﬁelds (ALL | EVENTS | INFO | CONFIGS | METRICS | RELATES_TO |
IS_RELATED_TO): limit the contents to return
• metricsToRetrieve, confsToRetrieve: further limit the contents to
return
• metricsLimit: limits the number of values in a Sme series

Reader REST API: query params
• relatesTo, isRelatedTo: filters by associaSon
• *Filters: filters by info, config, metric, event, …
• Supports complex filters including operators
• metricFilter=(((metric1 eq 50) AND (metric2 gt 40)) OR (metric1 lt
20))

Developing: TimelineClient
In your application master:
// create TimelineClient v.2 style
TimelineClient client = TimelineClient.createTimelineClient(appId);
client.init(conf);
client.start();
// bind it to AM/RM client to receive the collector address
amRMClient.registerTimelineClient(client);
// create and write timeline entities
TimelineEntity entity = new TimelineEntity();
client.putEntities(entity);
// when the app is complete, stop the timeline client
client.stop();

Developing: Flow context
In your app submitter:
ApplicationSubmissionContext appContext =
app.getApplicationSubmissionContext();
// set the flow context as YARN application tags
Set<String> tags = new HashSet<>();
tags.add(TimelineUtils.generateFlowNameTag("distributed grep"));
tags.add(TimelineUtils.generateFlowVersionTag(
"3df8b0d6100530080d2e0decf9e528e57c42a90a"));
tags.add(TimelineUtils.generateFlowRunIdTag(System.currentTimeMillis()));
appContext.setApplicationTags(tags);

SeIng up Timeline Service v.2
• Set up the HBase cluster (1.1.x)
• Add the Smeline service jar to HBase
• Install the ﬂow run coprocessor
• Create tables via TimelineSchemaCreator uSlity
• Conﬁgure the YARN cluster
• Enable Timeline Service v.2
• Add hbase-site.xml for the Smeline collector and readers
• Start the Smeline reader daemon

Milestone 1 ("Alpha 1")
• Merge discussion (YARN-2928) in progress as we speak!
✓ Complete end-to-end read/write
ﬂow
✓ Real Sme applicaSon and ﬂow
aggregaSon
✓ New enSty model
✓ HBase Storage
✓ Rich REST API
✓ IntegraSon with Distributed Shell
and MapReduce
✓ YARN generic events and system
metrics

Milestones - Future
• Milestone 2 (“Alpha 2”)
• IntegraSon with new YARN
UI
• IntegraSon with more
frameworks
• Beta
• Freeze API and storage schema
• Security
• Collectors as containers
• Storage fault tolerance
• ProducSon-ready
• MigraSon-ready

Contributors
• Li Lu, Junping Du, Vinod Kumar Vavilapalli (Hortonworks)
• Varun Saxena, Naganarasimha G. R. (Huawei)
• Sangjin Lee, Vrushali Channapa5an, Joep RoInghuis (Twi5er)
• Zhijie Shen (now at Facebook)
• The HBase and Phoenix community!

Timeline service V2 at the Hadoop Summit SJ 2016

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Timeline service V2 at the Hadoop Summit SJ 2016

Similaire à Timeline service V2 at the Hadoop Summit SJ 2016 (20)

Dernier

Dernier (20)

Timeline service V2 at the Hadoop Summit SJ 2016