3. What is Apache Ambari?
Apache Ambari is the open-source platform to
provision, manage and monitor Hadoop clusters
ApacheAmbariistheopen-sourceplatformto
provision,manageandmonitorHadoopclusters
6. Exciting Enterprise Features in Ambari 2.4
• New Services: Log Search, Zeppelin, Hive LLAP
• Role Based Access Control
• Management Packs
• Grafana UI for Ambari Metrics System
• New Views: Zeppelin, Storm
7. More in Ambari 2.4
• Alerts: Customizable props and thresholds
(AMBARI-14898)
• Alerts: Retry tolerance (AMBARI-15686)
• Alerts: New HDFS Alerts (AMBARI-14800)
• New Host Page Filtering (AMBARI-15210)
• Remove Service from UI (AMBARI-14759)
• Support for SLES 12 (AMBARI-16007)
• Stability: Database Consistency Checking
(AMBARI-16258)
• Customizable Ambari Log + PID Dirs
(AMBARI-15300)
• New Version Registration Experience
(AMBARI-15724)
• Log Search Technical Preview (AMBARI-
14927)
• Operational Audit Logging (AMBARI-15241)
• Role-Based Access Control (AMBARI-13977)
• Automated Setup of Ambari Kerberos through
Blueprints (AMBARI-15561)
• Automated Setup of Ambari Proxy User
(AMBARI-15561)
• Customizable Host Reg. SSH Port (AMBARI-
13450)
Core Features Security Features
• View URLs for bookmarks (AMBARI-15821),
View Refresh (AMBARI-15682)
• Inherit Cluster Permissions (AMBARI-16177)
• Remote Cluster Registration (AMBARI-
16274)
Views Framework
Features
9. Deploy On Premise
Ambari UI wizard handles all of these
combinations and makes recommendations
based on host specs.
10. Deploy On The Cloud
Certified environments
Sysprepped VMs
Hundreds of similar clusters
11. Deploy with Blueprints
• Systematic way of defining a cluster
• Export existing cluster into blueprint
/api/v1/clusters/:clusterName?format=blueprint
Config
s
Topology Hosts Cluster
16. Blueprints for Large Scale
• Kerberos, secure out-of-the-box
• High Availability is setup initially for
NameNode, YARN, Hive, Oozie, etc
• Host Discovery allows Ambari to
automatically install services for a Host
when it comes online
• Stack Advisor recommendations
18. Comprehensive Security
LDAP/AD
• User auth
• Sync
Kerberos
• MIT KDC
• Keytab
management
Atlas
• Governance
• Compliance
• Linage & history
• Data classification
Ranger
• Security policies
• Audit
• Authorization
Knox
• Perimeter security
• Supports LDAP/AD
• Sec. for
REST/HTTP
• SSL
19. Kerberos
Ambari manages Kerberos principals and keytabs
Works with existing MIT KDC or Active Directory
Once Kerberized, handles
1. Adding hosts
2. Adding components
to existing hosts
3. Adding services
4. Moving components
to different hosts
20. Management Packs
• Improved Release Management:
Decouple Ambari core from stacks
releases
• Support Add-ons:
Release vehicle for 3rd party services, views
Self-contained release artifacts
Stack is an overlay of multiple management
packs
22. Management Pack++
Short Term Goals (Ambari 2.4)
• Retrofit in Stack Processing Framework
• Enable 3rd party to ship add-on services
Future Goals
• Management Pack Framework
• Deliver Views
23. Role Based Access Control (RBAC)
As Ambari & organizations grow,
so do security needs
Ambari integrates with external
authentication systems & LDAP
24. RBAC Terms
Users belong to groups
A group has a role
Users can also have additional roles
Roles are applied to Resources. E.g.,
Ambari, particular Cluster, particular View
Roles have permissions
e.g., add services to cluster
25. New RBAC Roles
only view
↑, except change configs
↑, except alter cluster topology
or install components
Ambari Admin
Cluster Admin
Cluster Op
Service Admin
Service Op
Read-Only
↑, except add services, Kerberos,
manage alerts & upgrades
↑, except manage permissions
all
29. Background: Upgrade Terminology
Manual
Upgrade
The user follows instructions to upgrade
the stack
Incurs downtime
Rolling
Upgrade
Automated
Upgrades one component
per host at a time
Preserves cluster operation
and minimizes service impact
30. Background: Upgrade Terminology
Express
Upgrade
Automated
Runs in parallel across hosts
Incurs downtime
Manual
Upgrade
The user follows instructions to upgrade
the stack
Incurs downtime
Rolling
Upgrade
Automated
Upgrades one component
per host at a time
Preserves cluster operation
and minimizes service impact
31. Automated Upgrade: Rolling or Express
Check
Prerequisites
Review the
prereqs to
confirm
your cluster
configs are
ready
Prepare
Take
backups of
critical
cluster
metadata
Perform
Upgrade
Perform the
HDP
upgrade.
The steps
depend on
upgrade
method:
Rolling or
Express
Register +
Install
Register the
HDP
repository
and install
the target
HDP version
on the
cluster
Finalize
Finalize the
upgrade,
making the
target
version the
current
version
33. Alerting Framework
Alert Type Description Thresholds (units)
WEB Connects to a Web URL. Alert status is
based on the HTTP response code
Response Code (n/a)
Connection Timeout (seconds)
PORT Connects to a port. Alert status is based on
response time
Response (seconds)
METRIC Checks the value of a service metric. Units
vary, based on the metric being checked
Metric Value (units vary)
Connection Timeout (seconds)
AGGREGA
TE
Aggregates the status for another alert % Affected (percentage)
SCRIPT Executes a script to handle the alert check Varies
SERVER Executes a server-side runnable class to
handle the alert check
Varies
34. Alert Check Counts
• Customize the number of times an alert is
checked before dispatching a notification
• Avoid dispatching an alert notification (email, snmp)
in case of transient issues
35. Alerts - Configuring the Check Count
Set globally for all alerts, or override for a specific alert
Global
Setting Alert
Override
40. Log Search
Search and index HDP logs!
Capabilities
• Rapid Search of all HDP component logs
• Search across time ranges, log levels, and for
keywords
Solr
Logsearch
Ambari
41. Log Search
WO R K E R
N O D E
L O G
F E E D E R
Solr
LO G
S EA RC H
U I
Solr
Solr
A M BA R I
Java Process
Multi-output Support
Grok filters
Solr Cloud
Local Disk Storage
42. Future of Ambari
• Cloud features
• Service multi-instance (two ZK quorums)
• Service multi-versions (Spark 1.6 & Spark 2.0)
• YARN assemblies
• Patch Upgrades: upgrade individual components
in the same stack version, e.g., just DN and RM in
HDP 2.5.*.* with zero downtime
• Ambari High Availability
As good as
Editor's Notes
Single pane of glass.
Provision on the cloud
Metrics
Services
Config
Security
Alerts
Host management
Views framework
0.9 in Sep 2012
1.5 in April 2014
1.6 in July 2014
2.3.0 was not used
2.4.0 is slated with a ton of new features.
2179 Jiras.
Cadence is 2-3 major releases per year, with follow up maintenance releases in the months after.
http://jsfiddle.net/mp8rqq5x/2/
Log Search : Solr, Logfeeder (similar to Logstash), and Grafana UI
Zeppelin for data exploration and visualization that can plugin to multiple data backends
Role Based Access Control
Alerts,
Stability
EU/RU experience
LogSearch
Security automation
Views Framework ease of use
Deploy: Blueprints with Host Discovery
Secure: Kerberos, LDAP syncSmart Configs: stack advisor, painful to configure a thousand related knobs. E.g, change zoozkeeper quorum then that has an effect on several services. Log folder, then affects log search.
Upgrade: Rolling and Express Upgrade, get patches
Monitor: Ambari Alerts, Ambari Metrics
Analyze, Scale, Extend: Views, Management Packs
Cloudbreak can install on Amazon EC2, MSFT Azure,
Cluster install takes 5-10 mins, mostly downloading packages, installing bits, and starting services.
Used by HDInsight (Microsoft Azure) and Hortonworks QA
Allow cluster creation or scaling to be started via the REST API prior to all/any hosts being available. As hosts register with Ambari server they will be matched to request host groups and provisioned according to the requested topology
Allow host predicates to be specified along with host count to provide more flexibility in matching hosts to host groups. This will allow for host flavors where different host groups are matched to different host flavors
Break up the current monolithic provisioning request into a request for each host operation. For example, install on host A, start on host A, install on hostB, etc. This will allow hosts to make progress even when another host encounters a failure.
Allow a host count to be specified in the cluster creation template instead of host names. This is documented in https://issues.apache.org/jira/browse/AMBARI-6275
Install a cluster with two API calls
The blueprint contains the configs, assignment of topology to host group, stack version
The creation actually assigns hosts to each host group.
The blueprint contains the configs, assignment of topology to host group, stack version
The creation actually assigns hosts to each host group.
The blueprint contains the configs, assignment of topology to host group, stack version
The creation actually assigns hosts to each host group.
The blueprint contains the configs, assignment of topology to host group, stack version
The creation actually assigns hosts to each host group.
Dynamic availability
Allow host_count to be specified instead of host_namesAs hosts register, they will be matched to the request host groups and provisioned according to to the requested topology
When specifying a host_count, a predicate can also be specified for finer-grained control
Dynamic availability
Allow host_count to be specified instead of host_namesAs hosts register, they will be matched to the request host groups and provisioned according to to the requested topology
When specifying a host_count, a predicate can also be specified for finer-grained control
3 Terabytes since units is in MB
Kerberos:
LDAP/AD
Services: Ranger, Atlas, Knox.
Ranger: setup security policies on who can access what. Authorization of audit files, plugins for other services like HDFS, Hive, Storm, etc.
Atlas: Lineage of data, compliance, especially in health care and financial institutions
Knox: perimeter security for HTTP and REST calls in the Hadoop Services. Works with SSL, Kerberos.
Kerberos Key Distribution Center so we can define service principals and keytabs.
Can use existing KDC (key distribution center) or install one for Hadoop
Hadoop uses a rule-based system to create mappings between service principals and their related UNIX username
As Ambari grows and organizations grow, so do security needs
Users have fine-grained roles over the cluster and individual views.
Granular authorization checks to distribute the responsibilities and privileges of authenticated users
Configuration files
Package with python scripts and templates
Alert definitions
Kerberos configurations, principals, identities, keytabs
Meta data
Metric details
UI controls and widgets
Stack Advisor, can now ship the recommendations for a service with the service itself, instead of a monolithic stack advisor for the entire stack.
Makes it easier to integrate customer services
Express Upgrade: fasted method to upgrade the stack since upgrades an entire component in batches of 100 hosts at a time
Rolling Upgrade, one component at a time per host, which can take up to 1 min. For a 100 node cluster with
Express Upgrade: fasted method to upgrade the stack since upgrades an entire component in batches of 100 hosts at a time
Rolling Upgrade, one component at a time per host, which can take up to 1 min. For a 100 node cluster with
Express Upgrade: fasted method to upgrade the stack since upgrades an entire component in batches of 100 hosts at a time
Rolling Upgrade, one component at a time per host, which can take up to 1 min. For a 100 node cluster with
This Grafana instance is specifically for AMS, not meant to be general-purpose
If customer is already using Grafana, this is not a replacement.
Grafana will support read-only access for anonymous users, and HTTPS
Aggregates across entire cluster, filter by host, top/bottom x, functions like avg/sum/min/max, filter by date range
This is not HDP Search, it is not something that the customer has to separately license, it is an embedded Solr instance
Agent/Collection process running on each host
Written in Java
Tails all service log files
Parses logs using Grok/regex. Can merge multiple line logs, e.g. stack trace
On restart, can resume from last read line. Uses checkpoint files to maintain state
Extendable design to send logs to multiple destination type.
Currently can send logs to Solr and Kafka
Major themes.
Goal is to make it as good as Australian pies