Partha Saha and CW Chung (Visa)
Visa has embarked on an ambitious multi-year redesign of its entire data platform that powers its business. As part of this plan, the Apache Hadoop ecosystem, including HBase, will now become a staple in many of its solutions. Here, we will describe our journey in rolling out a high-availability NoSQL solution based on HBase behind some of our prominent mobile offerings.
What Goes Wrong with Language Definitions and How to Improve the Situation
Rolling Out Apache HBase for Mobile Offerings at Visa
1. | HBaseCon 2016 | May 24, 20161
Rolling Out Apache HBase
for Mobile Offerings at Visa
Partha Saha
pasaha@visa.com
CW Chung
cchung@visa.com
2. | HBaseCon 2016 | May 24, 20162
Data loaded in real-time
Over 100 Billion rows as
history from most recent
Milli-second response times
for write/read
What this talk is about – A choice of NoSQL at Visa
Scale
Speed
Real-time
3. | HBaseCon 2016 | May 24, 20163
An example of a mobile offering
Add card to wallet
Pay
For
Purchase
See your transaction
Right away
along with
recent history
Need
NoSQL
Here
4. | HBaseCon 2016 | May 24, 20164
We chose HBase as a NoSQL solution.
We built a scalable and real-time Transaction History
Service.
We migrated prominent Mobile wallet offerings to the
Service.
This talk is about our learnings over the last year.
5. | HBaseCon 2016 | May 24, 20165
This talk …
1. We assume some knowledge and familiarity of HBase.
2. We used HBase 1.0.0 with Cloudera Distribution CDH 5.4.3, so our observations
are based on that version of HBase.
3. We cover the important learning events along the way of adoption of HBase
in Visa
1. These can help new teams adopting HBase so that they avoid the same
pitfalls.
2. Our learning continues as we take on more interesting and challenging
opportunities.
6. | HBaseCon 2016 | May 24, 20166
Is YCSB a good way to compare NoSQL options?
7. | HBaseCon 2016 | May 24, 20167
It is actually not…
• Unless you know how to configure your NoSQL options for optimal performance…
• You may be driven to another solution, because its performance seems “smoother”
and easier to explain by rudimentary knowledge.
0
20000
40000
60000
1
12
23
34
45
56
67
78
89
100
111
122
133
144
155
166
177
188
199
210
221
232
243
254
265
276
287
298
309
Series2
0
20000
40000
1
12
23
34
45
56
67
78
89
100
111
122
133
144
Series2
• It is a great tool however to observe how system configuration changes
performance, and explore the configuration space for various workloads.
8. | HBaseCon 2016 | May 24, 20168
Our YCSB experience…
• Very easy to set up!
• Got a baseline of HBase performance of the cluster. Rerun after significant
configuration & application code changes.
• Key parameters used:
– # of client threads
– # of operations
– # records in Data Set
– Workload mix of read/update/insert. (We added 100% insert/update workload).
– Use a bash driver script to test various combinations of parameters.
• Latency measurement type can be in histogram or timeseries. Both were useful.
9. | HBaseCon 2016 | May 24, 20169
Should you design yourself out of major compactions?
10. | HBaseCon 2016 | May 24, 201610
Not worth the trouble when you are starting…
• An argument may be made that if we need an “N” day rolling look back, we can
have daily tables that we create before and delete past the look back window. We
can then reason about how to compact each daily file. Will that make the system
operate better?
• Write amplification is a well known problem and gets a lot of attention, but
however, worrying about the problem during early design stages seemed like
premature optimization.
• We thought that we could always optimize later through rolling compactions and
diurnal patterns of traffic later once patterns of reads and writes were fully
understood.
11. | HBaseCon 2016 | May 24, 201611
Does your design need transactional support?
12. | HBaseCon 2016 | May 24, 201612
We analyzed our secondary and primary key
read/writes.
Primary key Fact
pk1
pk2
Seconda
ry key
Associations
sk1 {pk1}
sk2 {pk1, pk2}
Query keys for facts
Register
associations
• We concluded, by tracing reads and failures
through updates that inconsistencies were
short lived.
• We would have used a transaction support
library otherwise.
13. | HBaseCon 2016 | May 24, 201613
How do you hands-on learn about HBase without
going into Production?
14. | HBaseCon 2016 | May 24, 201614
We built a Continuous Integration and Learning
Environment
Build
Server
git/
Stash
Bamboo
Artifactory Client
Bamboo plan
Chef
Client
- Checkout
- Build
- Upload
- Deploy
- Run test
Test
Server
15. | HBaseCon 2016 | May 24, 201615
How do get Operations ready for HBase in Production?
16. | HBaseCon 2016 | May 24, 201616
We allocated one developer for 1 day/week to monitor
production problems …
Bangalore
India
Foster City
CA, USA
1. We shadowed the real
production
2. Any production
problem was given
priority by the whole
team
3. We used 2 sites for
24x7 eyes
4. Added Alert and
Monitoring dashboards
5. We launched only when
when we met certain
metrics
17. | HBaseCon 2016 | May 24, 201617
Loading data in real-time as it is read
18. | HBaseCon 2016 | May 24, 201618
We used a micro-batch approach
Pre-
Processor
Listing &
Sender
Tracker
Loader Master
Receiver
Loader Worker
Batch
Processor
LLF Reader
HBase
Load
Batch
Processor
LLF Reader
HBase
Load
Batch
Processor
Stream
Reader
HBase
Load
Listing &
Sender
Tracker
Notification
Master
Receiver
Notification
Worker
Batch
Processor
LLF
Reader
HBase
Registration
Query
Send
Notification
Batch
Processor
LLF
Reader
HBase
Registration
Query
Send
Notification
Batch
Processor
Stream
Reader
HBase Query
Send
Notification
IPC IPC
Micro-Batch (250 ms) Control and State Files
readswrites
1 per Stream 1..N per Master 1 per Stream1..N per Master
Stream N
Stream 2
stream1
….....
tail
We had to build an approach to remember and retry from
any point in each stream
20. | HBaseCon 2016 | May 24, 201620
The web-services Front End
Audit
DB
MQ
Config
Service
Access
Authorization
Encryption
UtilityAudit
Load
Distribution
Plugin
Cache
Subscription
Service
Failover
Service
BusinessComponent
DataService
Web
Service
Wrapper
Rest
Controller
API
Request
Transform
Response
Transform
Domain
Objects
Audit
Listener
HBaseAPI
HBase
Plugin
HBase Cluster
Gateway
and
Load
Balancer
22. | HBaseCon 2016 | May 24, 201622
We used 2 data centers to get availability
Data Center 1
Streams
Data Center 2
Streams
Replication of
non-native
streams
We use shadow tables to write for the other
when the other is down, and drain the shadow
tables for the other to catch-up
23. | HBaseCon 2016 | May 24, 201623
Learning your Data Center clock
24. | HBaseCon 2016 | May 24, 201624
HBase is sensitive to clock skew…
• Kerberos services do not tolerate more than a few minutes of clock skew.
• Warnings are generated for a small skews, large skews kill region-servers.
26. | HBaseCon 2016 | May 24, 201626
Client retries & IOExceptions
• Default HBase timeout/retries settings can take tens of minutes to timeout:
– hbase.rpc.timeout: 60 sec
– hbase.client.retries.number: 35
– hbase.client.pause: 100 msec (grows to 10 sec quickly after back-off)
– Longer when factor in potential retries by zookeeper!
– Blogs by Lars Hofhansl: “HBase Client timeouts”, “HBase client response times”
• We choose Fail Fast strategy, as end user device will do end-to-end retry.
• Timeout/retries settings: 1 sec timeout, 3 total tries.
– Works well for the same data center, as well as across data centers
• However, once a while, clients see IOExceptions!
– Caused by Region Server (busy in GC, major/minor compaction, … ?)
– Or the Network?
– Or the Client itself?
28. | HBaseCon 2016 | May 24, 201628
Correlating client exceptions
• Client side:
– Turn on hbase client debugging:
• log4j.logger.org.apache.hbase.client=DEBUG
• log4j.logger.org.apache.hbase.ipc=DEBUG
– Catch the exceptions to print out specific Region Server name:
• IOException, RetriesExhaustedWithDetailsException
• Server side:
– Then look into the specific Region Server log of that server.
• Works well when you know the specific server causing the IOExceptions.
– What if not?
29. | HBaseCon 2016 | May 24, 201629
Correlating client exceptions
• Build Root Cause Analysis software to:
– Collect the relevant logs from the sources:
• Client: application logs, hbase client logs, GC logs
• Hadoop server: HBase, HDFS, Zookeeper server and GC logs
• Cluster events: Cloudera Manage API
• Other logs: KDC logs, Kerberos canary, network latency monitoring
– Parse the logs (single line, multi-line text, json, xml) into csv files.
– Normalize data and time format, apply date and time range filtering.
– Apply text filtering and text reduction on verbose lines.
– Output: events csv, sorted by time and server, suitable for grep/awk/sort, hive/sql.
• Quickly get an total view of the sequence of events of various services.
• Sometime can identify the smoking gun (e.g. exception caused by GC ).
• Still useful in the few cases when no smoking gun can be found!
– Trouble-shooting is also a process of elimination.
31. | HBaseCon 2016 | May 24, 201631
Kerberos Gotchas – what we have learned
• Hostname uses FQDN (Fully Qualified Domain Name, like server123.abc.com)
• Use TCP rather than UDP (set udp_preference_limit = 1 in krb5.conf)
• KDC (MIT Kerberos) server:
– Configure to start up several kdc processes to handle bursty traffic (use –w option).
– Set up a backup kdc for higher availability.
• Debugging tips:
– $ export KRB5_TRACE=/dev/stderr (or to a file)
– $ log4j: -Dsun.security.krb5.debug=true
• Kerberos support is built into the Java JRE, using internal classes:
– Oracle JDK: com.sun classes; on IBM AIX: com.ibm
– Hadoop is built and tested against Oracle JDK ( mileage on AIX JDK varies).
• Good references (besides the usual documents on Kerberos, and HBase User mailing list):
– Steve Loughran: Hadoop and Kerberos: The Madness beyond the Gate.
– HBase and Hadoop common source code: UserGroupInformation.java.
32. | HBaseCon 2016 | May 24, 201632
Kerberos Gotchas – what we learned
– Renewing a TGT Ticket (Ticket Granting Ticket)
• After kinit successfully, application principal gets a Kerberos TGT ticket.
• By default, the TGT ticket is good for 10 hours.
• For long-running applications, 10 hours obviously is not enough: need to renew TGT.
• Initially uses a process/thread to do a kinit once every few hours.
– Still ran into some IOExceptions at the time of TGT of renewal.
– Not the recommended way for long-running applications.
• Now uses UGI API (UserGroupInformation): loginUserFromKeytab( ).
– Does not require a separate process/thread to do TGT renewal.
– Hadoop/HBase client class library will catch the exception due to TGT expiration, and will do a
reloginFromKeytab( ) to renew TGT automatically.
– Also considering spawn a thread and proactively invoke CheckTGTAndRelogin( ).
– Ongoing investigation: client occasionally still experiencing momentary IOException around the
time ticket renewal.
– Referral Ticket: when on realm is set up to trust another realm, be aware of the additional
kdc calls resulted when the kinit principal is from the trusted realm.
34. | HBaseCon 2016 | May 24, 201634
Garbage Collection
• Use G1 on Oracle JDK 1.8
• Basically using settings as recommended by folks from HBaseCon2015.
– By Eric Kaczmarek, Yanping Wang, Liqi Yi
• Set target GC pause to 100 msec; Young Gen to ~1GB.
• Observation consistent with their published results:
– Observed gc time in production:
• 100 msec or less: 67%
• 400 msec or less: 99.98%
• Important to track the actual production gc time, as Production and Test cluster
shows somewhat different distribution.
35. | HBaseCon 2016 | May 24, 201635
GC Duration comparison: production vs perf cluster
36. | HBaseCon 2016 | May 24, 201636
GC: How Good is MaxGCPauseMillis as a Target?
MaxGCPauseMillis = 100 Production Cluster
(gc in msec)
Test Cluster
(gc in msec)
# of gc events 165192 199883
Avg / Std Dev / Max 87.1 / 64.9 / 1530 msec 81.9 / 37.2 / 1370 msec
50 percentile (median) 80 msec 90 msec
95 percentile /
99% / 99.9% / 99.99%
210 msec /
270 / 450 / 660 msec
120 msec /
140 / 510 / 780 msec
Percentile of: 100 msec /
200 / 300 / 400 msec
67% /
95% / 99.4% / 99.8%
85% /
99.4% / 99.6% / 99.8%
38. | HBaseCon 2016 | May 24, 201638
Adopting an open source product is a journey…
• Learning from previous adoption successes is crucial – if use case has not been
tried/analyzed/written about before, chances are we have to pay for learning and
having alternate choices is a good idea.
• Making only one major technology change at a time is always a good idea.
• Setting up appropriate expectations through team members and agile processes is
important.
• Going to production scenario early as shadow and learning through frequent
releases is helpful.
• We believe extra capacity for peak workloads was very helpful.
• Having source code is very useful in learning and trouble-shooting.
39. | HBaseCon 2016 | May 24, 201639
It Takes a Village! Thank you!
Alexandr Peyko
Amit Sharma
Anthony Chu
Arindam Chakraborty
Artem Savinov
Aviral Agarwal
Bala Saravanan Kannan
Ben Crane
Carl Duque
Chetan Talanki
Debasis Mullick
Deepankar Palit
Hong Zhu
Igor Karpenko
Igor Peller
Igor Ulianitski
Jay Gardner
Jim Gordon
Karthikeyan Manickavasagan
Liang Gao
Murali Reddy
Nandakumar Jayakumar
Nimish Shah
Peter Meigs
Pradyot Sikdar
Praveen Rudraraju
Rajat Raj
Raj Merchia
Ralph Blore
Ranjan Dutta
Ricardo De Ocampo Domingo
Robert Walsh
Sabu Peter
Sam Hamilton
Sandeep Reddy
Satyaban Nandi
Soumya Das
Srijoy Aditya
Srinivas Reddy Surasani
Suchismita Nayak
Suresh Pulikara
Ujjwal Kumar
Vikash Talanki
Vinay Sarda
Waqar Hasan
Winnie Chau
Xuepeng (Hans) Li
Yanyan Hao
Yusuf Rahaman
Amandeep Khurana
Jeongho Park
Jugoslav Djajic
Justin Hayes
Michael Stack