Given the different structure of big data systems, they can be difficult to query, and even more difficult to explore. Hue, a Django-drive web application, integrates with these components and provides a clean, easy-to-use interface. In this discussion, we'll cover how the Hue project addressed communicating with Hbase, Hdfs, and various query engines. We'll also cover the reasons behind these design decisions.
2. What is Hue?
HUE 1
Desktop-like in a browser,
did its job but pretty slow,
memory leaks and not very
IE friendly but definitely
advanced for its time
(2009-2010).
Monday, March 3, 14
3. HISTORY
HUE 2
The first flat structure port,
with Twitter Bootstrap all
over the place.
Monday, March 3, 14
4. HISTORY
HUE 2.5
New apps, improved the UX
adding new nice
functionalities like
autocomplete and drag &
drop.
Monday, March 3, 14
8. Monday, March 3, 14
RE
O
ET
AS
T
M
B
BR
R
H
...
M
E
O
H
K
SP
AR
ER
Y
U
Q
IN
M
AD
DB
R
SE
U
ER
EP
R
SE
O
W
BR
O
P
O
O
KE
ZO
SQ
SE
BA
H
AR
C
SE
BR
A
O
W
SE
R
PA
L
IM
O
DE
W
SI
SE
G
O
R
N
O
ER
ZI
H
E
IV
E
B
JO
G
PI
SE
O
W
BR
JO
LE
FI
APPS
12. HADOOP INTERFACES
REST & THRIFT
Many Hadoop interfaces
used
CUSTOM CLIENTS
Provide custom clients for
more explicit API definitions
Monday, March 3, 14
WebHDFS
YARN API (RM, NM, MR...)
HiveServer2
Impala
HBase
Oozie
Sqoop2
ZooKeeper
...
13. PROTOCOLS
REST
Use python-requests and a
custom client to streamline
RESTful interface calls.
Thrift
Custom connection pooling
and socket multiplexing to
streamline thrift calls.
Monday, March 3, 14
http_client.HttpClient(url,
exc_class=WebHdfsException,
logger=LOG)
if security_enabled:
client.set_kerberos_auth()
return client
thrift_util.get_client(TCLIService.Client,
query_server['server_host'],
query_server['server_port'],
service_name=query_server['server_name'],
kerberos_principal=kerberos_principal_short_name,
use_sasl=use_sasl,
mechanism=mechanism,
username=user.username,
timeout_seconds=conf.SERVER_CONN_TIMEOUT.get(),
use_ssl=conf.SSL.ENABLED.get(),
ca_certs=conf.SSL.CACERTS.get(),
keyfile=conf.SSL.KEY.get(),
certfile=conf.SSL.CERT.get(),
validate=conf.SSL.VALIDATE.get())
14. ACCESSIBILITY
Middleware
Make Hadoop interfaces
accessible in request objects
class ClusterMiddleware(object):
def process_view(self, request, ...):
request.fs = cluster.get_hdfs(request.fs_ref)
if request.user.is_authenticated():
if request.fs is not None:
request.fs.setuser(request.user.username)
def download(request, path):
if not request.fs.exists(path):
raise Http404(_("File not found."))
if not request.fs.isfile(path):
raise PopupException(_("not a file."))
Monday, March 3, 14
16. HDFS - Communication
REST
The NameNode provides a
RESTful server called
WebHDFS
Explicit Client
Provide an API that is explicit
Request Accessible
Provide a middleware for
populating a request
member
Monday, March 3, 14
http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE
http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=OPEN
...
class WebHdfs(Hdfs):
def create(self, path, ...):
...
def read(self, path, ...):
...
def download(request, path):
if not request.fs.exists(path):
raise Http404(_("File not found."))
if not request.fs.isfile(path):
raise PopupException(_("not a file."))
17. HDFS - Cool Things
MIME Type Detection
Detect the various kinds of
files being read: Avro, GZIP,
etc.
Pagination
Nice pagination by block
size when viewing a file
(soon to be more like a PDF
reader with content
automatically being added)
Monday, March 3, 14
19. HBase - Technical Risk
2 Dimensions
Infinitely many columns and
rows
Sparseness
Column names will often
differ per row
Monday, March 3, 14
20. HBase - Communication
Thrift
Communicate with HBase
using Thrift for better
filtering
Explicit Client
Provide an API that is explicit
Monday, March 3, 14
class HBaseApi(Hdfs):
def createTable(self, cluster, tableName, ...):
...
def getRows(self, cluster, tableName, columns, ...):
...
21. HBase - Results
Improved View
Intelligent view that
collapses null cells
Better Search
Improved searchability of
HBase via flexible search
MIME Type Detection
Able to view documents in
HBase: PDF, images, etc
Monday, March 3, 14
23. Hive - Communication
Thrift
Communicate with
HiveServer2 using Thrift
Explicit Client
Provide a higher level API
that is explicit and easy to
configure
DBMS
Further the capacities of the
DBMS in Hue
Monday, March 3, 14
thrift_util.get_client(TCLIService.Client,
query_server['server_host'],
query_server['server_port'],
service_name=query_server['server_name'],
...)
class HiveServerClient:
HS2_MECHANISMS = {'KERBEROS': 'GSSAPI', 'NONE': 'PLAIN',
'NOSASL': 'NOSASL'}
def __init__(self, query_server, user, ...):
thrift_util.get_client(TCLIService.Client,
...
class HiveServer2Dbms(object):
def get_databases(self):
return self.client.get_databases()
...
def select_star_from(self, database, table):
hql = "SELECT * FROM `%s.%s` %s" % (database,
table.name, self._get_browse_limit_clause(table))
return self.execute_statement(hql)
...
24. Hive - Results
One Page App
Intelligent view that lets
users worry about their
queries
Secure
Achieved some level of
security through SASL,
Kerberos, and SSL
Navigation
Able to navigate databases
and tables easily
Monday, March 3, 14
26. Missed something?
GET STARTED
Take a closer look at REST and Thrift
communication in Hue
The inner workings of the Filebrowser
The fundamentals of the HBase browser
The concepts behind the Beeswax app
Monday, March 3, 14
27. What else does Hue do with Django?
Extensible settings
Security
Doc Model
Configuration of settings.py
provided through the hue.ini
Configurable session
timeouts, SAML
authentication, etc.
Polymorphic documents via
a base document model
Authentication
Permissions
Testing
LDAP, PAM, OAuth, etc.
provided through
authentication backends
Per-app permissions
configurable in the
UserAdmin
Mocked and functional tests
via nose + django-nose
Monday, March 3, 14
28. GET HUE
CLOUDERA’S CDH
TARBALL
CLOUDERA’S DEMO VM
Stable and highly tested
releases perfectly
integrated with the
Hadoop ecosystem,
automagically configured
by Cloudera Manager.
Try in advance the latest
and greatest but you’ll
have to configure
everything on your own.
HORTONWORKS*
MAPR*
In HDP there’s an old
forked version of Hue
2.3.
Newer version than HDP,
close to the original 2.5
minus apps like HBase,
Impala, Sqoop, Search.
Get to play with Hue and
various Hadoop
components in 5
minutes. It’s a self
contained CDH
environment ready to
HP CLOUD*
use.
The newest addition,
ships Hue 3.0 through
the GreenButton
products.
BIGTOP
EMBEDDED/DEMO IN IND. COMPANIES
* YOUR MILEAGE MAY VARY.
Monday, March 3, 14