Welcome to the Apache Ambari talk, your speakers today are myself Siddharth and my colleague Srimanth from Hortonworks.
With increased adoption of Amabri throughout the enterprise the focus at the moment scale out to 1000s of node.
With that in mind the focus of the talk is to demonstrate operations on a 2K node cluster with a glimpse at the future goals
We will look at awesome features that are a part of 1.6.0
Along with things that truly identify Ambari as a platform, that is Views and Extensibility
If you attend the birds of feather session tomorrow, we can do a further deep dive into these new development
This slide the represents Ambari’s position in the Hadoop technology stack and highlights key integration points with services that are either Cloud compute providers or big data analytics platforms
By the end of the talk you would get a fairly good idea of how Ambari enables the integration of these providers with the Hadoop eco-system
Orchestrator: Ambari State machine combined with the Action scheduler and the Heartbeat handler
Request Dispatcher: Service Provider interface and Resource provider layer
Clusters / Stacks etc. are all resources from Ambari API standpoint
Monitoring subsystem comprises of Ganglia as the metrics system and Nagios for the alerts
Host Component isolation for Ambari Server, Ganglia and Nagios and Masters
All testing done on VM’s on the cloud
So now we are going to look at a video. The story here is:
Let say you have sizeable cluster with need for additional compute capacity. And the new hardware that you intend to add needs to be configured differently from the existing cluster configuration.
We begin by well looking at the dashboard that shows the 2000 Slave nodes and rest of the nice and customizable Ambari widgets
Next step is to actually choose the groups of hosts that you want to customize.
What we are doing here is grouping hosts together using Config groups and we give it a name.
Lets select a few data nodes to demonstrate this.
Note: Since this is a paid cluster and it expensive to keep it running so we are showing you a video.
The Config group manager allows you to filter by Component and regular expressions, we make sure Datanode hosts are the only ones in the filter
Next use an expression to choose hosts you want, here I just chose them at random.
Now to actually making config changes
Restart all will restart in one shot and apply the config
The other option actually allows you to do rolling restart
When rubber meets the road what do we see as the performance bottleneck:
The monitoring and alerting subsystems on large clusters are bogged down by the amount of I/O operations to write relatively small amount of data at a high frequency to permanent storage.
These numbers for iostats are close to when we began optimizing performance, as you can see we were writing at 1GB/min
The most significant metric I would like to present is the load average improvements achieved through performance tuning effort
It involved tuning the rrdcached daemon used to write ganglia data and also reading it back using Ambari API as well as Nagios
Objective of this exercise is to certify Ambari with 2K nodes on run of the mill VMs with little to no optimization below the application stack and achieve acceptable performance for all management and monitoring operations.
In theory it is possible to go above an beyond this magic number
The goal is to actually scale to 10K+ nodes managed by single Ambari instance
This is still a conceptual architecture and you can follow the discussion on the Apache Jira that is listed
Quick word on the architecture, it involves scaling out of the collector daemon in proportion to cluster size
The Views that you see in this picture will be part of the later slide deck and here to represent capability to extend Ambari for provide user interface of your choice to visualize data in Hadoop cluster
Integrated with open source Quartz scheduler
API to schedule batch of requests to be executed as per schedule
Rolling restart is the first use case for request scheduling
Schedule and go home.
Host Configuration group is way of associating a set of configurations to a group of hosts per service
This feature is supported with Blueprints as well, so the touch-less install can still incorporate heterogeneous target hosts
- Additionally any custom property can be added to existing configuration
- Selective application of changed configs and know exactly when and where to apply them.
Blueprint as the name suggests is a declarative definition on the cluster which can be exported as a document from a live cluster or imported to create a new cluster from existing blueprint.
Real word use cases: The Savanna project, Launchpad on Microsoft Azure
Quick look at how to create a cluster using blueprint
Define, Host Groups: Can be thought of as all unique set of components and configurations that represent hosts in you cluster with cardinality from 1 to N.
Capture non-default configuration overrides
Point to stack name and version to use
When you POST to create a cluster you get back a request id that can be used to track progress of deployment
Real world use case of blueprints
HDP Launchpad for Azure (Linux) lets you spin up HDP clusters super easily - no need to for you to spin up VMs, create images, setup ssh etc. All you need is your Azure Account (with a credit card in good standing) to get started
Once you get the launchpad going it will do *everything* for you and publish Ambari URL for control entry point.
Under the hood, after running some Azure provisioning and setup scripts, all the goodness coming from Ambari Blueprint
When you manage a cluster of size 2000 nodes, you need ability to perform operations in bulk.
Bulk host operations are now available on Hosts page
Basically you identify which hosts – either all, filtered or selected
Then you perform operations – either host level, or component level operations
Components generally tend to be slaves/workers which are larger in number
Component operations tend to perform operations in batches.
For clusters with 2000 nodes you need good filters to easily find the appropriate hosts. Ambari provides 13 filters on its hosts page to help you.
So lets say
Hardware change/replacement on some nodes
Experimenting with service configurations
Turning off a service completely
Deleting cluster nodes
Maintenance Mode sliences alerts and skips operations.
Inheritance cannot be turned off on lower levels
We support safely moving the following master components from one host to another.
Even the 2 namenodes in HDFS HA.
Hadoop is an ecosystem with many services, many users and many many usecases.
Even with all the functionality provided in Ambari, there will always be a different way to use and view your cluster.
To allow users and admins to extend and contribute their own ‘view’ of the cluster, Ambari is providing the ‘Ambari Views’ framework.
Developers can now create their ‘view’ using this framework.
Gives users and administrators a single entry point into the cluster and allows for very interesting possibilities.
Views also nicely complement stack extensibility on the backend, by providing appropriate views for them in the front end.
Question: What is the admin functionality of views?
View descriptor is the central entry point.
Here you can see the view Id, display label you see in the menu, version of the view.
Each JAR is for a version of the view. A view version can have many instances of the view.
Each view can also define the parameters it needs to work – here you see list of cities this weather view needs.
You also see a REST resource defined – all you need to implement is the Java bean and a JAX-RS annotated class.
Each view can optionally define instances by default… here you see Europe. HDFS view does not have any instances because location of NameNode is a runtime value – not known at packaging time.
Once view jar is place into Ambari, you can then see the views, versions and instances.
You can create/update/delete view instances via calls.
So if your 3rd party tool wants a view to HDFS, they can create instance and send user to link.
Something that is being worked on is administration ability for views. Admins can configure views, provide entitlement for users, etc.
So admins can control the cluster, and users can view the cluster and use it.
In Hadoop 1.0 we visualized MapReduce jobs, their depdencies, and how the map and reduce tasks performed.
In Hadoop 2.0 MapReduce has been made more generic in Apache Tez.
Apache™ Tez generalizes the MapReduce paradigm to a more powerful framework for executing a complex DAG (directed acyclic graph) of tasks.
As you can see Hive, Pig and other data processing services are being ported on top of Tez.
For Hadoop 2.0 Ambari visualizes Hive queries using Tez engine.
Each Hive + Tez query is shown in the jobs table. Going to an individual job shows the Tez DAG mixed in with Hive information.
HDFS_ prefixed counters come from HDFS. They generally tend to be on first and last vertices of the DAG because that’s where they read and write from data.
FILE_ prefixed counters are local disk accesses for the vertex… they represent data read/written during spilling. It does not represent data transferred between vertices.
SPILLED_RECORDS – In Tez spilling of records can not only happen during vertex output (like MapReduce), but also at vertex input. For a vertex this number is for both.
Tasks
- FILE_BYTES_READ
- FILE_BYTES_WRITTEN = spill bytes size (3 reads out of 3r+3w) local disk only.
= does not include transporting across tasks
= Read configs
- HDFS_BYTES_READ|WRITTEN
= Generally on first and last vertices where HDFS is accessed.
- HDFS_READ_OPS = Listing directories (Direct HDFS counters)
- HDFS_WRITE_OPS = FS changes (Direct HDFS counters) - create folder, concat file, mkdir, etc.
- SPILLED_RECORDS = 3w+3r+1sort-w = Records in 3+1.
- They occur in Output (when spilling locally when > memory)
- They occur in Input (when collecting from multiple inputs)
- If a vertex has both Input and Output - this will be sum of both.
Summary metrics are shown for all vertices, so that you can compare relative performance of vertices.
Tasks
- FILE_BYTES_READ
- FILE_BYTES_WRITTEN = spill bytes size (3 reads out of 3r+3w) local disk only.
= does not include transporting across tasks
= Read configs
- HDFS_BYTES_READ|WRITTEN
= Generally on first and last vertices where HDFS is accessed.
- HDFS_READ_OPS = Listing directories (Direct HDFS counters)
- HDFS_WRITE_OPS = FS changes (Direct HDFS counters) - create folder, concat file, mkdir, etc.
- SPILLED_RECORDS = 3w+3r+1sort-w = Records in 3+1.
- They occur in Output (when spilling locally when > memory)
- They occur in Input (when collecting from multiple inputs)
- If a vertex has both Input and Output - this will be sum of both.
Hive and Tez have hooks to push notifications to ATS. Ambari pulls/GETs information from ATS.
Other components plan to use ATS more – so Ambari should be able to show other types of Jobs.
To enable Hive + Tez, admins should go to Hive configurations and set “hive.execution.engine” to “tez”. Default is “mr”.
Other important tez configs are shown – like YARN container size etc for Hive+Tez queries.
Jobs viewer can handle large queries. Like this one is approximately 70 Tez vertices 12 reduce vertices.
The graph is more readable than the text above to analyze issues.
- What truly identifies Ambari as a platform – Ability to add new services and manage and monitor a custom stack of components
Stack is an all inclusive and self contained definition of all services and their life cycle within Ambari
Let start by encapsulating components and configuration in a stack definition
Next allow a developer to define component life cycle by declaring relationships between different states of a component
REST API allows you to discover what is available
Last plug it into Ambari to bring it all together
Command scripts are way to tell Ambari what needs to be executed in order to achieve a state change, example, going from INSTALLED to STARTED entails executing a user defined start script of a component in the desired stack.
Custom Commands and Custom Actions are similar to command scripts but independent of a state change and can be executed on demand using Ambari API, Example: Decommission Datanode, Run rebalancer, verify kerberos settings
Extension makes it easy to add new stacks
Command scripts are bundled with the server and downloaded to the agents.
At registration time agents check to make sure the MD5 checksum of the downloaded script archive is the same on the server as in the agent cache, if not a agent downloads new definitions from the server.
This makes on demand / on site modifications easy to change and verify.
HBASE service definition in the stack
The metrics.json files defines all metrics emitted by HBASE as well as how these metrics would show up in the Ambari API
Contains configuration, package of command scripts and definition of the service in metainfo.xml
Metainfo.xml: Link HBASE_MASTER component to the script which defines the life cycle commands (start, stop, install, configure) and custom commands if any
Package: The actual command scripts which will be executed on the agents
Example of a command script. Important to mention the python resource management framework of Ambari allows developer to extend a based class called Script and define a resources similar to other languages like puppet