SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Django+NoSQL
HOW Hue Integrates
with Hadoop
Abraham Elmahrek
Cloudera - March 5th, 2014

Monday, March 3, 14
What is Hue?
HUE 1

Desktop-like in a browser,
did its job but pretty slow,
memory leaks and not very
IE friendly but definitely
advanced for its time
(2009-2010).

Monday, March 3, 14
HISTORY
HUE 2

The first flat structure port,
with Twitter Bootstrap all
over the place.

Monday, March 3, 14
HISTORY
HUE 2.5

New apps, improved the UX
adding new nice
functionalities like
autocomplete and drag &
drop.

Monday, March 3, 14
HISTORY
HUE 3 ALPHA

Proposed design, didn’t
make it.

Monday, March 3, 14
HISTORY
HUE 3

Transition to the new UI,
major improvements and
new apps.

Monday, March 3, 14
HISTORY
HUE 3.5+

Monday, March 3, 14
Monday, March 3, 14

RE

O

ET
AS
T

M

B
BR

R

H

...

M
E

O

H

K

SP
AR

ER
Y

U

Q

IN

M

AD
DB

R

SE

U

ER

EP

R

SE

O
W

BR
O
P

O
O
KE

ZO

SQ

SE

BA

H

AR
C

SE

BR
A
O
W
SE
R

PA
L

IM

O
DE
W
SI
SE
G
O
R
N
O
ER
ZI
H
E
IV
E

B

JO

G

PI

SE

O
W

BR

JO

LE

FI

APPS
APPS

Hue Plugins
YARN

Monday, March 3, 14

JobTracker

Pig
Oozie

Cloudera
Impala

HiveServer2
HDFS

Hive
Metastore

HBase
Solr

Zookeeper
Sqoop2

LDAP
SAML
FAST PACE
LAST MONTH

91 issues created and 90
resolved.
Core team + Community

Monday, March 3, 14
STACK
BACKEND
Python + Django (2.6+/
1.4.5)

Monday, March 3, 14

FRONTEND
jQuery
Bootstrap
Knockout.js
Love
HADOOP INTERFACES
REST & THRIFT

Many Hadoop interfaces
used
CUSTOM CLIENTS

Provide custom clients for
more explicit API definitions

Monday, March 3, 14

WebHDFS
YARN API (RM, NM, MR...)
HiveServer2
Impala
HBase
Oozie
Sqoop2
ZooKeeper
...
PROTOCOLS
REST

Use python-requests and a
custom client to streamline
RESTful interface calls.
Thrift

Custom connection pooling
and socket multiplexing to
streamline thrift calls.

Monday, March 3, 14

http_client.HttpClient(url,
exc_class=WebHdfsException,
logger=LOG)
if security_enabled:
client.set_kerberos_auth()
return client
thrift_util.get_client(TCLIService.Client,
query_server['server_host'],
query_server['server_port'],
service_name=query_server['server_name'],
kerberos_principal=kerberos_principal_short_name,
use_sasl=use_sasl,
mechanism=mechanism,
username=user.username,
timeout_seconds=conf.SERVER_CONN_TIMEOUT.get(),
use_ssl=conf.SSL.ENABLED.get(),
ca_certs=conf.SSL.CACERTS.get(),
keyfile=conf.SSL.KEY.get(),
certfile=conf.SSL.CERT.get(),
validate=conf.SSL.VALIDATE.get())
ACCESSIBILITY
Middleware

Make Hadoop interfaces
accessible in request objects

class ClusterMiddleware(object):
def process_view(self, request, ...):
request.fs = cluster.get_hdfs(request.fs_ref)
if request.user.is_authenticated():
if request.fs is not None:
request.fs.setuser(request.user.username)

def download(request, path):
if not request.fs.exists(path):
raise Http404(_("File not found."))
if not request.fs.isfile(path):
raise PopupException(_("not a file."))

Monday, March 3, 14
HDFS
Goal

Easily browse, create, read,
update, and delete files in
HDFS

Monday, March 3, 14
HDFS - Communication
REST

The NameNode provides a
RESTful server called
WebHDFS
Explicit Client

Provide an API that is explicit

Request Accessible

Provide a middleware for
populating a request
member

Monday, March 3, 14

http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE
http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=OPEN
...

class WebHdfs(Hdfs):
def create(self, path, ...):
...
def read(self, path, ...):
...
def download(request, path):
if not request.fs.exists(path):
raise Http404(_("File not found."))
if not request.fs.isfile(path):
raise PopupException(_("not a file."))
HDFS - Cool Things
MIME Type Detection

Detect the various kinds of
files being read: Avro, GZIP,
etc.
Pagination

Nice pagination by block
size when viewing a file
(soon to be more like a PDF
reader with content
automatically being added)

Monday, March 3, 14
HBase
Goal

Make it easy to view and
search HBase

Monday, March 3, 14
HBase - Technical Risk
2 Dimensions

Infinitely many columns and
rows

Sparseness

Column names will often
differ per row

Monday, March 3, 14
HBase - Communication
Thrift

Communicate with HBase
using Thrift for better
filtering

Explicit Client

Provide an API that is explicit

Monday, March 3, 14

class HBaseApi(Hdfs):
def createTable(self, cluster, tableName, ...):
...
def getRows(self, cluster, tableName, columns, ...):
...
HBase - Results
Improved View

Intelligent view that
collapses null cells

Better Search

Improved searchability of
HBase via flexible search
MIME Type Detection

Able to view documents in
HBase: PDF, images, etc

Monday, March 3, 14
Hive
Goal

Make it easy to run queries
in Hive

Monday, March 3, 14
Hive - Communication
Thrift

Communicate with
HiveServer2 using Thrift

Explicit Client

Provide a higher level API
that is explicit and easy to
configure
DBMS

Further the capacities of the
DBMS in Hue

Monday, March 3, 14

thrift_util.get_client(TCLIService.Client,
query_server['server_host'],
query_server['server_port'],
service_name=query_server['server_name'],
...)
class HiveServerClient:
HS2_MECHANISMS = {'KERBEROS': 'GSSAPI', 'NONE': 'PLAIN',
'NOSASL': 'NOSASL'}
def __init__(self, query_server, user, ...):
thrift_util.get_client(TCLIService.Client,
...
class HiveServer2Dbms(object):
def get_databases(self):
return self.client.get_databases()
...
def select_star_from(self, database, table):
hql = "SELECT * FROM `%s.%s` %s" % (database,
table.name, self._get_browse_limit_clause(table))
return self.execute_statement(hql)
...
Hive - Results
One Page App

Intelligent view that lets
users worry about their
queries
Secure

Achieved some level of
security through SASL,
Kerberos, and SSL
Navigation

Able to navigate databases
and tables easily

Monday, March 3, 14
DEMO
TIME

Monday, March 3, 14
Missed something?
GET STARTED

Take a closer look at REST and Thrift
communication in Hue
The inner workings of the Filebrowser
The fundamentals of the HBase browser
The concepts behind the Beeswax app

Monday, March 3, 14
What else does Hue do with Django?
Extensible settings

Security

Doc Model

Configuration of settings.py
provided through the hue.ini

Configurable session
timeouts, SAML
authentication, etc.

Polymorphic documents via
a base document model

Authentication

Permissions

Testing

LDAP, PAM, OAuth, etc.
provided through
authentication backends

Per-app permissions
configurable in the
UserAdmin

Mocked and functional tests
via nose + django-nose

Monday, March 3, 14
GET HUE
CLOUDERA’S CDH

TARBALL

CLOUDERA’S DEMO VM

Stable and highly tested
releases perfectly
integrated with the
Hadoop ecosystem,
automagically configured
by Cloudera Manager.

Try in advance the latest
and greatest but you’ll
have to configure
everything on your own.

HORTONWORKS*

MAPR*

In HDP there’s an old
forked version of Hue
2.3.

Newer version than HDP,
close to the original 2.5
minus apps like HBase,
Impala, Sqoop, Search.

Get to play with Hue and
various Hadoop
components in 5
minutes. It’s a self
contained CDH
environment ready to
HP CLOUD*
use.
The newest addition,
ships Hue 3.0 through
the GreenButton
products.

BIGTOP

EMBEDDED/DEMO IN IND. COMPANIES

* YOUR MILEAGE MAY VARY.

Monday, March 3, 14
LINKS
WEBSITE

http://gethue.com
GITHUB

https://github.com/cloudera/hue/
BLOG

http://blog.gethue.com
TWITTER

@gethue
USER GROUP

hue-user@

Monday, March 3, 14
THANKS.
QUESTIONS?

gethue.com

Monday, March 3, 14

Contenu connexe

En vedette

Alta White Paper D2C eCommerce Case Study 2016
Alta White Paper D2C eCommerce Case Study 2016Alta White Paper D2C eCommerce Case Study 2016
Alta White Paper D2C eCommerce Case Study 2016Patrick Nicholson
 
Diarrhea:Myths and facts, Precaution
Diarrhea:Myths and facts, Precaution Diarrhea:Myths and facts, Precaution
Diarrhea:Myths and facts, Precaution Wuzna Haroon
 
Secure PIN Management How to Issue and Change PINs Securely over the Web
Secure PIN Management How to Issue and Change PINs Securely over the WebSecure PIN Management How to Issue and Change PINs Securely over the Web
Secure PIN Management How to Issue and Change PINs Securely over the WebSafeNet
 
Energy Strategy Group_Report 2012 efficienza energetica
Energy Strategy Group_Report 2012 efficienza energeticaEnergy Strategy Group_Report 2012 efficienza energetica
Energy Strategy Group_Report 2012 efficienza energeticaEugenio Bacile di Castiglione
 

En vedette (7)

Alta White Paper D2C eCommerce Case Study 2016
Alta White Paper D2C eCommerce Case Study 2016Alta White Paper D2C eCommerce Case Study 2016
Alta White Paper D2C eCommerce Case Study 2016
 
"15 Business Story Ideas to Jump on Now"
"15 Business Story Ideas to Jump on Now""15 Business Story Ideas to Jump on Now"
"15 Business Story Ideas to Jump on Now"
 
cathy resume
cathy resumecathy resume
cathy resume
 
Diarrhea:Myths and facts, Precaution
Diarrhea:Myths and facts, Precaution Diarrhea:Myths and facts, Precaution
Diarrhea:Myths and facts, Precaution
 
Credit cards
Credit cardsCredit cards
Credit cards
 
Secure PIN Management How to Issue and Change PINs Securely over the Web
Secure PIN Management How to Issue and Change PINs Securely over the WebSecure PIN Management How to Issue and Change PINs Securely over the Web
Secure PIN Management How to Issue and Change PINs Securely over the Web
 
Energy Strategy Group_Report 2012 efficienza energetica
Energy Strategy Group_Report 2012 efficienza energeticaEnergy Strategy Group_Report 2012 efficienza energetica
Energy Strategy Group_Report 2012 efficienza energetica
 

Plus de gethue

Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...gethue
 
SF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big DataSF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big Datagethue
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browsergethue
 
20150627 bigdatala
20150627 bigdatala20150627 bigdatala
20150627 bigdatalagethue
 
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and HueHadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Huegethue
 
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0gethue
 
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014gethue
 
Interactively Search and Visualize Your Big Data
Interactively Search and Visualize Your Big DataInteractively Search and Visualize Your Big Data
Interactively Search and Visualize Your Big Datagethue
 
Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
Sqoop2 refactoring for generic data transfer - NYC Sqoop MeetupSqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetupgethue
 
LDAP, SAML and Hue
LDAP, SAML and HueLDAP, SAML and Hue
LDAP, SAML and Huegethue
 
Hadoop Israel - HBase Browser in Hue
Hadoop Israel - HBase Browser in HueHadoop Israel - HBase Browser in Hue
Hadoop Israel - HBase Browser in Huegethue
 
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop MeetupIntegrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetupgethue
 
Hue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop SingaporeHue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop Singaporegethue
 
SF Dev Meetup - Hue SDK
SF Dev Meetup - Hue SDKSF Dev Meetup - Hue SDK
SF Dev Meetup - Hue SDKgethue
 
Hue: The Hadoop UI - Where we stand, Hue Meetup SF
Hue: The Hadoop UI - Where we stand, Hue Meetup SF Hue: The Hadoop UI - Where we stand, Hue Meetup SF
Hue: The Hadoop UI - Where we stand, Hue Meetup SF gethue
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Groupgethue
 
Hue: The Hadoop UI - HUG France
Hue: The Hadoop UI - HUG FranceHue: The Hadoop UI - HUG France
Hue: The Hadoop UI - HUG Francegethue
 
Hue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUGHue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUGgethue
 

Plus de gethue (18)

Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
 
SF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big DataSF Solr Meetup - Interactively Search and Visualize Your Big Data
SF Solr Meetup - Interactively Search and Visualize Your Big Data
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browser
 
20150627 bigdatala
20150627 bigdatala20150627 bigdatala
20150627 bigdatala
 
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and HueHadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
 
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
 
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
 
Interactively Search and Visualize Your Big Data
Interactively Search and Visualize Your Big DataInteractively Search and Visualize Your Big Data
Interactively Search and Visualize Your Big Data
 
Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
Sqoop2 refactoring for generic data transfer - NYC Sqoop MeetupSqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
 
LDAP, SAML and Hue
LDAP, SAML and HueLDAP, SAML and Hue
LDAP, SAML and Hue
 
Hadoop Israel - HBase Browser in Hue
Hadoop Israel - HBase Browser in HueHadoop Israel - HBase Browser in Hue
Hadoop Israel - HBase Browser in Hue
 
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop MeetupIntegrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
 
Hue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop SingaporeHue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop Singapore
 
SF Dev Meetup - Hue SDK
SF Dev Meetup - Hue SDKSF Dev Meetup - Hue SDK
SF Dev Meetup - Hue SDK
 
Hue: The Hadoop UI - Where we stand, Hue Meetup SF
Hue: The Hadoop UI - Where we stand, Hue Meetup SF Hue: The Hadoop UI - Where we stand, Hue Meetup SF
Hue: The Hadoop UI - Where we stand, Hue Meetup SF
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Group
 
Hue: The Hadoop UI - HUG France
Hue: The Hadoop UI - HUG FranceHue: The Hadoop UI - HUG France
Hue: The Hadoop UI - HUG France
 
Hue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUGHue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUG
 

Dernier

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Dernier (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

How Hue integrates Hadoop with Django

  • 1. Django+NoSQL HOW Hue Integrates with Hadoop Abraham Elmahrek Cloudera - March 5th, 2014 Monday, March 3, 14
  • 2. What is Hue? HUE 1 Desktop-like in a browser, did its job but pretty slow, memory leaks and not very IE friendly but definitely advanced for its time (2009-2010). Monday, March 3, 14
  • 3. HISTORY HUE 2 The first flat structure port, with Twitter Bootstrap all over the place. Monday, March 3, 14
  • 4. HISTORY HUE 2.5 New apps, improved the UX adding new nice functionalities like autocomplete and drag & drop. Monday, March 3, 14
  • 5. HISTORY HUE 3 ALPHA Proposed design, didn’t make it. Monday, March 3, 14
  • 6. HISTORY HUE 3 Transition to the new UI, major improvements and new apps. Monday, March 3, 14
  • 8. Monday, March 3, 14 RE O ET AS T M B BR R H ... M E O H K SP AR ER Y U Q IN M AD DB R SE U ER EP R SE O W BR O P O O KE ZO SQ SE BA H AR C SE BR A O W SE R PA L IM O DE W SI SE G O R N O ER ZI H E IV E B JO G PI SE O W BR JO LE FI APPS
  • 9. APPS Hue Plugins YARN Monday, March 3, 14 JobTracker Pig Oozie Cloudera Impala HiveServer2 HDFS Hive Metastore HBase Solr Zookeeper Sqoop2 LDAP SAML
  • 10. FAST PACE LAST MONTH 91 issues created and 90 resolved. Core team + Community Monday, March 3, 14
  • 11. STACK BACKEND Python + Django (2.6+/ 1.4.5) Monday, March 3, 14 FRONTEND jQuery Bootstrap Knockout.js Love
  • 12. HADOOP INTERFACES REST & THRIFT Many Hadoop interfaces used CUSTOM CLIENTS Provide custom clients for more explicit API definitions Monday, March 3, 14 WebHDFS YARN API (RM, NM, MR...) HiveServer2 Impala HBase Oozie Sqoop2 ZooKeeper ...
  • 13. PROTOCOLS REST Use python-requests and a custom client to streamline RESTful interface calls. Thrift Custom connection pooling and socket multiplexing to streamline thrift calls. Monday, March 3, 14 http_client.HttpClient(url, exc_class=WebHdfsException, logger=LOG) if security_enabled: client.set_kerberos_auth() return client thrift_util.get_client(TCLIService.Client, query_server['server_host'], query_server['server_port'], service_name=query_server['server_name'], kerberos_principal=kerberos_principal_short_name, use_sasl=use_sasl, mechanism=mechanism, username=user.username, timeout_seconds=conf.SERVER_CONN_TIMEOUT.get(), use_ssl=conf.SSL.ENABLED.get(), ca_certs=conf.SSL.CACERTS.get(), keyfile=conf.SSL.KEY.get(), certfile=conf.SSL.CERT.get(), validate=conf.SSL.VALIDATE.get())
  • 14. ACCESSIBILITY Middleware Make Hadoop interfaces accessible in request objects class ClusterMiddleware(object): def process_view(self, request, ...): request.fs = cluster.get_hdfs(request.fs_ref) if request.user.is_authenticated(): if request.fs is not None: request.fs.setuser(request.user.username) def download(request, path): if not request.fs.exists(path): raise Http404(_("File not found.")) if not request.fs.isfile(path): raise PopupException(_("not a file.")) Monday, March 3, 14
  • 15. HDFS Goal Easily browse, create, read, update, and delete files in HDFS Monday, March 3, 14
  • 16. HDFS - Communication REST The NameNode provides a RESTful server called WebHDFS Explicit Client Provide an API that is explicit Request Accessible Provide a middleware for populating a request member Monday, March 3, 14 http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=OPEN ... class WebHdfs(Hdfs): def create(self, path, ...): ... def read(self, path, ...): ... def download(request, path): if not request.fs.exists(path): raise Http404(_("File not found.")) if not request.fs.isfile(path): raise PopupException(_("not a file."))
  • 17. HDFS - Cool Things MIME Type Detection Detect the various kinds of files being read: Avro, GZIP, etc. Pagination Nice pagination by block size when viewing a file (soon to be more like a PDF reader with content automatically being added) Monday, March 3, 14
  • 18. HBase Goal Make it easy to view and search HBase Monday, March 3, 14
  • 19. HBase - Technical Risk 2 Dimensions Infinitely many columns and rows Sparseness Column names will often differ per row Monday, March 3, 14
  • 20. HBase - Communication Thrift Communicate with HBase using Thrift for better filtering Explicit Client Provide an API that is explicit Monday, March 3, 14 class HBaseApi(Hdfs): def createTable(self, cluster, tableName, ...): ... def getRows(self, cluster, tableName, columns, ...): ...
  • 21. HBase - Results Improved View Intelligent view that collapses null cells Better Search Improved searchability of HBase via flexible search MIME Type Detection Able to view documents in HBase: PDF, images, etc Monday, March 3, 14
  • 22. Hive Goal Make it easy to run queries in Hive Monday, March 3, 14
  • 23. Hive - Communication Thrift Communicate with HiveServer2 using Thrift Explicit Client Provide a higher level API that is explicit and easy to configure DBMS Further the capacities of the DBMS in Hue Monday, March 3, 14 thrift_util.get_client(TCLIService.Client, query_server['server_host'], query_server['server_port'], service_name=query_server['server_name'], ...) class HiveServerClient: HS2_MECHANISMS = {'KERBEROS': 'GSSAPI', 'NONE': 'PLAIN', 'NOSASL': 'NOSASL'} def __init__(self, query_server, user, ...): thrift_util.get_client(TCLIService.Client, ... class HiveServer2Dbms(object): def get_databases(self): return self.client.get_databases() ... def select_star_from(self, database, table): hql = "SELECT * FROM `%s.%s` %s" % (database, table.name, self._get_browse_limit_clause(table)) return self.execute_statement(hql) ...
  • 24. Hive - Results One Page App Intelligent view that lets users worry about their queries Secure Achieved some level of security through SASL, Kerberos, and SSL Navigation Able to navigate databases and tables easily Monday, March 3, 14
  • 26. Missed something? GET STARTED Take a closer look at REST and Thrift communication in Hue The inner workings of the Filebrowser The fundamentals of the HBase browser The concepts behind the Beeswax app Monday, March 3, 14
  • 27. What else does Hue do with Django? Extensible settings Security Doc Model Configuration of settings.py provided through the hue.ini Configurable session timeouts, SAML authentication, etc. Polymorphic documents via a base document model Authentication Permissions Testing LDAP, PAM, OAuth, etc. provided through authentication backends Per-app permissions configurable in the UserAdmin Mocked and functional tests via nose + django-nose Monday, March 3, 14
  • 28. GET HUE CLOUDERA’S CDH TARBALL CLOUDERA’S DEMO VM Stable and highly tested releases perfectly integrated with the Hadoop ecosystem, automagically configured by Cloudera Manager. Try in advance the latest and greatest but you’ll have to configure everything on your own. HORTONWORKS* MAPR* In HDP there’s an old forked version of Hue 2.3. Newer version than HDP, close to the original 2.5 minus apps like HBase, Impala, Sqoop, Search. Get to play with Hue and various Hadoop components in 5 minutes. It’s a self contained CDH environment ready to HP CLOUD* use. The newest addition, ships Hue 3.0 through the GreenButton products. BIGTOP EMBEDDED/DEMO IN IND. COMPANIES * YOUR MILEAGE MAY VARY. Monday, March 3, 14