SlideShare une entreprise Scribd logo
1  sur  51
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Apache Ambari
Managing 2000 node Hadoop cluster
Siddharth Wagle, PMC
swagle (@apache, @hortonworks)
Srimanth Gunturi, PMC
srimanth(@apache, @hortonworks)
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Agenda
• Operating at scale
• Lessons learned
• Beyond 2K
• Ambari 1.6.0 highlights
• New Management features
• Blueprints
• Ambari Views
• Extensibility
• Q & A
Page 2
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari: Enterprise Hadoop Operations
Apache Ambari is the only 100% open source framework for
provisioning, managing and monitoring Apache Hadoop clusters
AMBARI
WEB
Page 3
Viewpoint Others
AMBARI REST APIs
AMBARI SERVER
PROVISION | MANAGE | MONITOR
compute
&
storage
. . .
. . .
. .
compute
&
storage
.
.
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
100% Apache Open Source
• Active Community
- 70+ Contributors / 40+ Committers
- 240+ Ambari User Group Members
Page 4
2013
Dec Apache Ambari Graduates to Top Level Project
2014
Apr
2014
May
Apache Ambari 1.5.1 Released
Adds operations for Hadoop 2.1 Stack
Apache Ambari 1.6.0 Released
New Ambari features
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Overview and Architecture
Page 5
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Platform Architecture
6
DB Orchestrator Monitoring
REST API
Request Dispatcher
Ambari
Web
Ambari
Server
Ambari
Agent/s
Ganglia/
Nagios/jmx
AuthProvider
/clusters
/stacks
/views …
User
Repo
java
python
puppet
JS
RDBMS
LDAP
AD
Cluster
Configuration
s
and Topology
resources
Definitions
stacks,
actions, views
REST API
Web Client
Configurable
Auth Provider
Bootstrap or
Manual install Monitoring
Providers
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Demo
2000 Nodes on commodity hardware
Page 7
Process CPU RAM (process)
Ambari Server 16 core 2 GB
Ganglia 16 core 8 GB
Nagios 8 core 8 GB
Masters 8 core 8 GB
Slaves 1 core 4 GB
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Demo Video
• Increase compute capacity with Next Gen Slaves
• Group the new hosts with Manage Config groups feature
• Override a default config property for the new group
• Apply the config by performing rolling restarts on the next
gen slaves with 0 – little downtime expectation
Page 8
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page 9
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Optimizations with Ambari 1.6.1
• Better utilization of rrdcached
• Tuning Nagios with recommended performance configurations
• Ambari API optimizations
Page 11
Process Starting point 1.6.1
Ambari Server > 10 (0.63) ~ 6.0 (0.37)
Ganglia Server > 12 (0.75) ~ 0.94 (0.06)
Nagios > 14 (1.75) ~ 6.8 (0.85)
 Load Average comparison
 iostats
Process Starting point 1.6.1
Ganglia Server > 10.3 GB writes ~ 0.3 GB write
> 34 MB reads cached reads
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Beyond 2K ?
• Better metric collection with fan out
• Ability to export metrics to existing analytics and long term metric
persistence solutions like OpenTSDB
• Improve the alerting subsystem to minimize I/O overhead for alerts
processing
• Server Scale out solution for handling heartbeats and server agent
talk for 10K+ nodes
Page 12
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Hadoop
Daemon
AmbariMetricsSink
Rack-aware
Ambari Metrics
Collector
(1…N)
AmbariMetricsService
MySQL
Ambari
Agent
HostMetricsCollector
Future of Ambari Metrics System ?
(AMBARI-5707)
Long term storage
AMBARI
AMBARI
Views
Hive
Pig
TEZ
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari 1.6.0 Features
Page 14
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Request Scheduling
• Open source quartz scheduler integration
• Create a batch of requests executed in the order of creation
• Expose API to allow user to create own schedules
Page 15
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Rolling Restarts
• Goal: minimize cluster downtime
• Optionally include only hosts with configurations changes
• Set host batch size + time to wait between batches
• Set failure tolerance to halt restarts automatically
Page 16
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Host Configuration Groups
• Set custom configuration properties for one or more host groups (e.g.
“host overrides”)
• Important for handing “heterogeneous” HW clusters
–Different memory, mount points, directories
17
HEAPSIZE= 1024
HEAPSIZE= 512
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Staged Configurations Changes
• Restart indicators
• Push changes without affecting liveliness of the service
Page 19
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Blueprints
• Blueprint defines a cluster layout and
component configuration
• Simplifies “Headless Installs”
• Export blueprint from cluster
• Boot and Save wizard with blueprint
BLUEPRINT
AMBARI
Submit to Ambari
via REST CLUSTER
Ambari provisions
cluster
BLUEPRINT
<stack>
<host>
<service>
<component>
<config>
HOST
MANIFEST
<host>
<meta>
SERVICE
CONFIGS
<props>
Page 20
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Cluster create with Blueprint
Page 21
• POST /api/v1/blueprints/:blueprintName • POST /api/v1/clusters/:clusterName
201 Created
202 - Accepted
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page 22
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Bulk Host Operations
• Perform operations such as Stop, Start, Restart, Decommission,
Maintenance Mode in “bulk” form
• Perform operations on all hosts, filtered hosts or a selected group of
hosts
• Perform host level operations, or component type operations.
Page 23
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Bulk Host Operations
• 10+ ways to filter hosts - component type and state, alerts, stale
configurations, maintenance mode, etc.
Page 24
• Component type start, stop, restart operations are performed in
batches
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Maintenance Mode
• Goal: silence alerts for services, hosts and components when
performing maintenance
• Ability to put Service or Host “Out of Service”
• Alerts will be suspended for that item
• Item will not respond to bulk operations (such as restarts)
Page 25
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Maintenance Mode
• Components inherit maintenance mode from either service or host
• Service/Host in maintenance mode
–Bulk operations skipped
–Host/Service operations skipped (start all, stop all and restart all)
Page 26
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Moving Masters
Page 27
• Move master components to
different hosts
– NameNode (including HA)
– SecondaryNameNode
– TaskTracker (Hadoop 1)
– ResourceManager (Hadoop 2)
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Views
Page 28
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Views
• Goal: Customize the Ambari Web experience
• Allows creation of custom views (API and UI) of cluster
• Gives users and admins a single entry point to cluster
• Views compliment Stack Extensibility
–Stack Extensibility makes custom Stack Services available to
Ambari
–Views expose custom UI features for Services
• Ambari Admins can entitle “views” to Ambari Web users
–Entitlements framework for finer-grained permissions control for
Ambari users
Page 29
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Views – Demo
Page 30
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Views – Packaging
Page 31
files-0.1.0-SNAPSHOT.jar
├── WEB-INF
│ └── web.xml
│ └── lib
├── index.html
├── org
│ └── apache
│ └── ambari
│ └── view
│ └── filebrowser
│ ├── HdfsApi.class
│ └── ...
└── view.xml
# ls -l /var/lib/ambari-server/resources/views/
-rw-r--r--. 1 root root 26023710 Jun 1 00:55 files-0.1.0-SNAPSHOT.jar
-rw-r--r--. 1 root root 22578573 Jun 1 00:55 pig-0.1.0-SNAPSHOT.jar
-rw-r--r--. 1 root root 54649972 Jun 1 00:55 slider-0.1.0-SNAPSHOT.jar
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Views: view.xml
Page 32
<view>
<name>WEATHER</name>
<label>Weather</label>
<version>1.0.0</version>
<parameter>
<name>cities</name>
<description>The list of cities.</description>
<required>true</required>
</parameter>
<resource>
<name>city</name>
<plural-name>cities</plural-name>
<id-property>id</id-property>
<resource-class>org.apache.ambari.view.weather.CityResource</resource-class>
<provider-class>org.apache.ambari.view.weather.CityResourceProvider</provider-class>
<service-class>org.apache.ambari.view.weather.CityService</service-class>
</resource>
<instance>
<name>EUROPE</name>
<property>
<key>cities</key>
<value>London, UK;Paris;Munich</value>
</property>
</instance>
</view>
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Views – Framework API
• GET
– http://server:8080/api/v1/views
– http://server:8080/api/v1/views/{view-id}/versions
– http://server:8080/api/v1/views/{view-id}/versions/{view-version}/instances
– http://server:8080/api/v1/views/{view-id}/versions/{view-
version}/instances/{view-instance}
• POST
– Create new instance of view with appropriate parameters
– http://server:8080/api/v1/views/{view-id}/versions/{view-
version}/instances/{view-instance}
– Parameter example for HDFS view – dataworker.defaultFS, dataworker.username
• PUT
– Update {view-instance} with modified parameters
– http://server:8080/api/v1/views/{view-id}/versions/{view-
version}/instances/{view-instance}
• DELETE
– Delete {view-instance}
– http://server:8080/api/v1/views/{view-id}/versions/{view-
version}/instances/{view-instance}
Page 33
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Views – View Instance API
• GET UI
– http://server:8080/views/{view-id}/{view-version}/{view-instance}
• GET API
– http://server:8080/api/v1/views/{view-id}/versions/{view-
version}/instances/{view-instance}/resources/{resource-name}
– http://server:8080/api/v1/views/{view-id}/versions/{view-
version}/instances/{view-instance}/{servlet-path}
• Example: HDFS
– GET: http://views-1:8080/views/FILES/0.1.0/HDFS
– GET: http://views-
1:8080/api/v1/views/FILES/versions/0.1.0/instances/HDFS/resources/files/fileops/l
istdir?path=%2F
– GET: http://views-
1:8080/api/v1/views/FILES/versions/0.1.0/instances/HDFS/resources/files/download/
browse?path=%2Fuser%2Fhdfs%2FplayerYears.pig&download=true
– POST: http://views-
1:8080/api/v1/views/FILES/versions/0.1.0/instances/HDFS/resources/files/fileops/r
ename
Page 34
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Views – Single cluster interface
Page 37
Administrators can control cluster Data Workers can use cluster
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Jobs
Page 38
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Jobs
Page 39
• Hadoop 1.0: MapReduce
– Visualize MapReduce jobs in swimlanes
– Task scatter plots across jobs
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Jobs
Page 40
• Hadoop 2.0: YARN + Tez
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Jobs
Page 41
• Visualize Hive queries using Tez engine
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Jobs
Page 42
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Jobs - Counters
Page 43
FILE_BYTES_READ +
HDFS_BYTES_READ
FILE_BYTES_WRITTEN +
HDFS_BYTES_WRITTEN
HDFS_WRITE_OPS /
HDFS_BYTES_WRITTEN
HDFS_READ_OPS /
HDFS_BYTES_READ
FILE_WRITE_OPS /
FILE_BYTES_WRITTEN
FILE_READ_OPS /
FILE_BYTES_READ
SPILLED_RECORDS
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Jobs – DAG Graph
Page 44
Summary Metrics
• Input
• Output
• Tez Tasks
• Spilled Records
Vertex Types
• Map Vertex
• Reduce Vertex
• Union Vertex
Hive Operators
Edge Types
• Scatter Gather
• Broadcast
• Contains
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Jobs
Page 45
• Event notification flow
ATS (Application Timeline
Server – YARN)
Ambari
PUSH
PULL
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Jobs - Configurations
Page 46
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Jobs – Scaling
Page 47
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Extensibility
Page 48
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ambari Stacks
• Goal: Reduce time + effort to add new Services to Ambari for
provisioning, management and monitoring
• Ambari defines a consistent Service lifecycle management interface
that can be extended
• Dynamically add Stacks + Services definitions
Page 49
AMBARI
{rest}
<ambari-web>
Stack
HDFS YARN MR2
Hive
Pig
Oozie
NEW
NEW
NEW
HDP-2.0
Stack
GlusterFS YARN MR2
Hive HIVENEW
2.0-GlusterFS
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Stack Details
• Stacks define Services + Repos
– What is in the Stack, and where to get the bits
• Each Service has a definition
– What Components are part of the Service
• Each Service has defined lifecycle commands
– start, stop, status, install, configure
• Lifecycle is controlled via command scripts
• Ability to define “custom” commands
• Ability to “extend” Stacks
Page 50
AMBARI
SERVER
Stack
Command
Scripts
Service
Definitions
AMBARI
AGENT/S
AMBARI
AGENT/S
AMBARI
AGENT/S
pythonxml
Repos
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Stack Mechanics
• Ambari Server reads Stack definitions on start
• Ambari Server sends a command to Agents
• Agents download Stack definition + command scripts
• Agent executes command
• If the Stack definition changes, Agent will request latest Stack
definition + command scripts
Page 51
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Declarative Definition
Page 52
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
In closing …
Page 53
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Everyone is welcome to contribute
• Thank you for all the contributions
• Bring your favorite Hadoop services to Ambari
• Useful Links
– Website
– http://apache.apache.org
– Mailing Lists
– http://ambari.apache.org/mail-lists.html
– Development Wiki
– https://cwiki.apache.org/confluence/display/AMBARI
• Current and Upcoming Releases
– Ambari 1.6.1 (pending release)
– Ambari 1.6.0 (May)
Page 54
© Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Thank you.
Page 55

Contenu connexe

Tendances

Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication confluent
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Kubernetes Security with Calico and Open Policy Agent
Kubernetes Security with Calico and Open Policy AgentKubernetes Security with Calico and Open Policy Agent
Kubernetes Security with Calico and Open Policy AgentCloudOps2005
 
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개OpenStack Korea Community
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka confluent
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelinesSumant Tambe
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 

Tendances (20)

Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Kubernetes Security with Calico and Open Policy Agent
Kubernetes Security with Calico and Open Policy AgentKubernetes Security with Calico and Open Policy Agent
Kubernetes Security with Calico and Open Policy Agent
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 

Similaire à Managing a 2000 Node Hadoop Cluster with Apache Ambari

Accumulo Summit 2014: Monitoring Apache Accumulo
Accumulo Summit 2014: Monitoring Apache AccumuloAccumulo Summit 2014: Monitoring Apache Accumulo
Accumulo Summit 2014: Monitoring Apache AccumuloAccumulo Summit
 
Apache Ambari - What's New in 1.6.0
Apache Ambari - What's New in 1.6.0Apache Ambari - What's New in 1.6.0
Apache Ambari - What's New in 1.6.0Hortonworks
 
Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Hortonworks
 
Ambari blueprints-overview
Ambari blueprints-overviewAmbari blueprints-overview
Ambari blueprints-overviewShivaji Dutta
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNHortonworks
 
Ambari Meetup: What's New in Ambari
Ambari Meetup: What's New in AmbariAmbari Meetup: What's New in Ambari
Ambari Meetup: What's New in AmbariHortonworks
 
Deploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARIDeploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARIDataWorks Summit
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariJayush Luniya
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariHortonworks
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Hortonworks
 
Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4 Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4 Hortonworks
 
Ambari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalAmbari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalChris Westin
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Janos Matyas
 
Apache Ambari BOF - Overview - Hadoop Summit 2013
Apache Ambari BOF - Overview - Hadoop Summit 2013Apache Ambari BOF - Overview - Hadoop Summit 2013
Apache Ambari BOF - Overview - Hadoop Summit 2013Hortonworks
 
Architecture & Operations
Architecture & OperationsArchitecture & Operations
Architecture & OperationsVMware Tanzu
 
Bring your Service to YARN
Bring your Service to YARNBring your Service to YARN
Bring your Service to YARNDataWorks Summit
 
Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...
Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...
Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...VMware Tanzu
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureDataWorks Summit
 

Similaire à Managing a 2000 Node Hadoop Cluster with Apache Ambari (20)

Accumulo Summit 2014: Monitoring Apache Accumulo
Accumulo Summit 2014: Monitoring Apache AccumuloAccumulo Summit 2014: Monitoring Apache Accumulo
Accumulo Summit 2014: Monitoring Apache Accumulo
 
Apache Ambari - What's New in 1.6.0
Apache Ambari - What's New in 1.6.0Apache Ambari - What's New in 1.6.0
Apache Ambari - What's New in 1.6.0
 
Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1
 
Ambari blueprints-overview
Ambari blueprints-overviewAmbari blueprints-overview
Ambari blueprints-overview
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARN
 
Ambari Meetup: What's New in Ambari
Ambari Meetup: What's New in AmbariAmbari Meetup: What's New in Ambari
Ambari Meetup: What's New in Ambari
 
What's new OpenStack kilo
What's new OpenStack kiloWhat's new OpenStack kilo
What's new OpenStack kilo
 
Deploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARIDeploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARI
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache Ambari
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache Ambari
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
 
Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4 Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4
 
Ambari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalAmbari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.final
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
 
Apache Ambari BOF - Overview - Hadoop Summit 2013
Apache Ambari BOF - Overview - Hadoop Summit 2013Apache Ambari BOF - Overview - Hadoop Summit 2013
Apache Ambari BOF - Overview - Hadoop Summit 2013
 
Architecture & Operations
Architecture & OperationsArchitecture & Operations
Architecture & Operations
 
Bring your Service to YARN
Bring your Service to YARNBring your Service to YARN
Bring your Service to YARN
 
Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...
Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...
Pivotal CenturyLink Cloud Platform Seminar Presentations: Architecture & Oper...
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Dernier (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Managing a 2000 Node Hadoop Cluster with Apache Ambari

  • 1. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Apache Ambari Managing 2000 node Hadoop cluster Siddharth Wagle, PMC swagle (@apache, @hortonworks) Srimanth Gunturi, PMC srimanth(@apache, @hortonworks)
  • 2. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Agenda • Operating at scale • Lessons learned • Beyond 2K • Ambari 1.6.0 highlights • New Management features • Blueprints • Ambari Views • Extensibility • Q & A Page 2
  • 3. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari: Enterprise Hadoop Operations Apache Ambari is the only 100% open source framework for provisioning, managing and monitoring Apache Hadoop clusters AMBARI WEB Page 3 Viewpoint Others AMBARI REST APIs AMBARI SERVER PROVISION | MANAGE | MONITOR compute & storage . . . . . . . . compute & storage . .
  • 4. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION 100% Apache Open Source • Active Community - 70+ Contributors / 40+ Committers - 240+ Ambari User Group Members Page 4 2013 Dec Apache Ambari Graduates to Top Level Project 2014 Apr 2014 May Apache Ambari 1.5.1 Released Adds operations for Hadoop 2.1 Stack Apache Ambari 1.6.0 Released New Ambari features
  • 5. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Overview and Architecture Page 5
  • 6. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Platform Architecture 6 DB Orchestrator Monitoring REST API Request Dispatcher Ambari Web Ambari Server Ambari Agent/s Ganglia/ Nagios/jmx AuthProvider /clusters /stacks /views … User Repo java python puppet JS RDBMS LDAP AD Cluster Configuration s and Topology resources Definitions stacks, actions, views REST API Web Client Configurable Auth Provider Bootstrap or Manual install Monitoring Providers
  • 7. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Demo 2000 Nodes on commodity hardware Page 7 Process CPU RAM (process) Ambari Server 16 core 2 GB Ganglia 16 core 8 GB Nagios 8 core 8 GB Masters 8 core 8 GB Slaves 1 core 4 GB
  • 8. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Demo Video • Increase compute capacity with Next Gen Slaves • Group the new hosts with Manage Config groups feature • Override a default config property for the new group • Apply the config by performing rolling restarts on the next gen slaves with 0 – little downtime expectation Page 8
  • 9. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page 9
  • 10. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Optimizations with Ambari 1.6.1 • Better utilization of rrdcached • Tuning Nagios with recommended performance configurations • Ambari API optimizations Page 11 Process Starting point 1.6.1 Ambari Server > 10 (0.63) ~ 6.0 (0.37) Ganglia Server > 12 (0.75) ~ 0.94 (0.06) Nagios > 14 (1.75) ~ 6.8 (0.85)  Load Average comparison  iostats Process Starting point 1.6.1 Ganglia Server > 10.3 GB writes ~ 0.3 GB write > 34 MB reads cached reads
  • 11. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Beyond 2K ? • Better metric collection with fan out • Ability to export metrics to existing analytics and long term metric persistence solutions like OpenTSDB • Improve the alerting subsystem to minimize I/O overhead for alerts processing • Server Scale out solution for handling heartbeats and server agent talk for 10K+ nodes Page 12
  • 12. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Hadoop Daemon AmbariMetricsSink Rack-aware Ambari Metrics Collector (1…N) AmbariMetricsService MySQL Ambari Agent HostMetricsCollector Future of Ambari Metrics System ? (AMBARI-5707) Long term storage AMBARI AMBARI Views Hive Pig TEZ
  • 13. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari 1.6.0 Features Page 14
  • 14. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Request Scheduling • Open source quartz scheduler integration • Create a batch of requests executed in the order of creation • Expose API to allow user to create own schedules Page 15
  • 15. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Rolling Restarts • Goal: minimize cluster downtime • Optionally include only hosts with configurations changes • Set host batch size + time to wait between batches • Set failure tolerance to halt restarts automatically Page 16
  • 16. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Host Configuration Groups • Set custom configuration properties for one or more host groups (e.g. “host overrides”) • Important for handing “heterogeneous” HW clusters –Different memory, mount points, directories 17 HEAPSIZE= 1024 HEAPSIZE= 512
  • 17. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Staged Configurations Changes • Restart indicators • Push changes without affecting liveliness of the service Page 19
  • 18. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Blueprints • Blueprint defines a cluster layout and component configuration • Simplifies “Headless Installs” • Export blueprint from cluster • Boot and Save wizard with blueprint BLUEPRINT AMBARI Submit to Ambari via REST CLUSTER Ambari provisions cluster BLUEPRINT <stack> <host> <service> <component> <config> HOST MANIFEST <host> <meta> SERVICE CONFIGS <props> Page 20
  • 19. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Cluster create with Blueprint Page 21 • POST /api/v1/blueprints/:blueprintName • POST /api/v1/clusters/:clusterName 201 Created 202 - Accepted
  • 20. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page 22
  • 21. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Bulk Host Operations • Perform operations such as Stop, Start, Restart, Decommission, Maintenance Mode in “bulk” form • Perform operations on all hosts, filtered hosts or a selected group of hosts • Perform host level operations, or component type operations. Page 23
  • 22. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Bulk Host Operations • 10+ ways to filter hosts - component type and state, alerts, stale configurations, maintenance mode, etc. Page 24 • Component type start, stop, restart operations are performed in batches
  • 23. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Maintenance Mode • Goal: silence alerts for services, hosts and components when performing maintenance • Ability to put Service or Host “Out of Service” • Alerts will be suspended for that item • Item will not respond to bulk operations (such as restarts) Page 25
  • 24. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Maintenance Mode • Components inherit maintenance mode from either service or host • Service/Host in maintenance mode –Bulk operations skipped –Host/Service operations skipped (start all, stop all and restart all) Page 26
  • 25. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Moving Masters Page 27 • Move master components to different hosts – NameNode (including HA) – SecondaryNameNode – TaskTracker (Hadoop 1) – ResourceManager (Hadoop 2)
  • 26. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Views Page 28
  • 27. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views • Goal: Customize the Ambari Web experience • Allows creation of custom views (API and UI) of cluster • Gives users and admins a single entry point to cluster • Views compliment Stack Extensibility –Stack Extensibility makes custom Stack Services available to Ambari –Views expose custom UI features for Services • Ambari Admins can entitle “views” to Ambari Web users –Entitlements framework for finer-grained permissions control for Ambari users Page 29
  • 28. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views – Demo Page 30
  • 29. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views – Packaging Page 31 files-0.1.0-SNAPSHOT.jar ├── WEB-INF │ └── web.xml │ └── lib ├── index.html ├── org │ └── apache │ └── ambari │ └── view │ └── filebrowser │ ├── HdfsApi.class │ └── ... └── view.xml # ls -l /var/lib/ambari-server/resources/views/ -rw-r--r--. 1 root root 26023710 Jun 1 00:55 files-0.1.0-SNAPSHOT.jar -rw-r--r--. 1 root root 22578573 Jun 1 00:55 pig-0.1.0-SNAPSHOT.jar -rw-r--r--. 1 root root 54649972 Jun 1 00:55 slider-0.1.0-SNAPSHOT.jar
  • 30. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views: view.xml Page 32 <view> <name>WEATHER</name> <label>Weather</label> <version>1.0.0</version> <parameter> <name>cities</name> <description>The list of cities.</description> <required>true</required> </parameter> <resource> <name>city</name> <plural-name>cities</plural-name> <id-property>id</id-property> <resource-class>org.apache.ambari.view.weather.CityResource</resource-class> <provider-class>org.apache.ambari.view.weather.CityResourceProvider</provider-class> <service-class>org.apache.ambari.view.weather.CityService</service-class> </resource> <instance> <name>EUROPE</name> <property> <key>cities</key> <value>London, UK;Paris;Munich</value> </property> </instance> </view>
  • 31. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views – Framework API • GET – http://server:8080/api/v1/views – http://server:8080/api/v1/views/{view-id}/versions – http://server:8080/api/v1/views/{view-id}/versions/{view-version}/instances – http://server:8080/api/v1/views/{view-id}/versions/{view- version}/instances/{view-instance} • POST – Create new instance of view with appropriate parameters – http://server:8080/api/v1/views/{view-id}/versions/{view- version}/instances/{view-instance} – Parameter example for HDFS view – dataworker.defaultFS, dataworker.username • PUT – Update {view-instance} with modified parameters – http://server:8080/api/v1/views/{view-id}/versions/{view- version}/instances/{view-instance} • DELETE – Delete {view-instance} – http://server:8080/api/v1/views/{view-id}/versions/{view- version}/instances/{view-instance} Page 33
  • 32. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views – View Instance API • GET UI – http://server:8080/views/{view-id}/{view-version}/{view-instance} • GET API – http://server:8080/api/v1/views/{view-id}/versions/{view- version}/instances/{view-instance}/resources/{resource-name} – http://server:8080/api/v1/views/{view-id}/versions/{view- version}/instances/{view-instance}/{servlet-path} • Example: HDFS – GET: http://views-1:8080/views/FILES/0.1.0/HDFS – GET: http://views- 1:8080/api/v1/views/FILES/versions/0.1.0/instances/HDFS/resources/files/fileops/l istdir?path=%2F – GET: http://views- 1:8080/api/v1/views/FILES/versions/0.1.0/instances/HDFS/resources/files/download/ browse?path=%2Fuser%2Fhdfs%2FplayerYears.pig&download=true – POST: http://views- 1:8080/api/v1/views/FILES/versions/0.1.0/instances/HDFS/resources/files/fileops/r ename Page 34
  • 33. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views – Single cluster interface Page 37 Administrators can control cluster Data Workers can use cluster
  • 34. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Jobs Page 38
  • 35. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs Page 39 • Hadoop 1.0: MapReduce – Visualize MapReduce jobs in swimlanes – Task scatter plots across jobs
  • 36. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs Page 40 • Hadoop 2.0: YARN + Tez
  • 37. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs Page 41 • Visualize Hive queries using Tez engine
  • 38. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs Page 42
  • 39. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs - Counters Page 43 FILE_BYTES_READ + HDFS_BYTES_READ FILE_BYTES_WRITTEN + HDFS_BYTES_WRITTEN HDFS_WRITE_OPS / HDFS_BYTES_WRITTEN HDFS_READ_OPS / HDFS_BYTES_READ FILE_WRITE_OPS / FILE_BYTES_WRITTEN FILE_READ_OPS / FILE_BYTES_READ SPILLED_RECORDS
  • 40. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs – DAG Graph Page 44 Summary Metrics • Input • Output • Tez Tasks • Spilled Records Vertex Types • Map Vertex • Reduce Vertex • Union Vertex Hive Operators Edge Types • Scatter Gather • Broadcast • Contains
  • 41. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs Page 45 • Event notification flow ATS (Application Timeline Server – YARN) Ambari PUSH PULL
  • 42. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs - Configurations Page 46
  • 43. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs – Scaling Page 47
  • 44. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Extensibility Page 48
  • 45. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Stacks • Goal: Reduce time + effort to add new Services to Ambari for provisioning, management and monitoring • Ambari defines a consistent Service lifecycle management interface that can be extended • Dynamically add Stacks + Services definitions Page 49 AMBARI {rest} <ambari-web> Stack HDFS YARN MR2 Hive Pig Oozie NEW NEW NEW HDP-2.0 Stack GlusterFS YARN MR2 Hive HIVENEW 2.0-GlusterFS
  • 46. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Stack Details • Stacks define Services + Repos – What is in the Stack, and where to get the bits • Each Service has a definition – What Components are part of the Service • Each Service has defined lifecycle commands – start, stop, status, install, configure • Lifecycle is controlled via command scripts • Ability to define “custom” commands • Ability to “extend” Stacks Page 50 AMBARI SERVER Stack Command Scripts Service Definitions AMBARI AGENT/S AMBARI AGENT/S AMBARI AGENT/S pythonxml Repos
  • 47. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Stack Mechanics • Ambari Server reads Stack definitions on start • Ambari Server sends a command to Agents • Agents download Stack definition + command scripts • Agent executes command • If the Stack definition changes, Agent will request latest Stack definition + command scripts Page 51
  • 48. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Declarative Definition Page 52
  • 49. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION In closing … Page 53
  • 50. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Everyone is welcome to contribute • Thank you for all the contributions • Bring your favorite Hadoop services to Ambari • Useful Links – Website – http://apache.apache.org – Mailing Lists – http://ambari.apache.org/mail-lists.html – Development Wiki – https://cwiki.apache.org/confluence/display/AMBARI • Current and Upcoming Releases – Ambari 1.6.1 (pending release) – Ambari 1.6.0 (May) Page 54
  • 51. © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Thank you. Page 55

Notes de l'éditeur

  1. Welcome to the Apache Ambari talk, your speakers today are myself Siddharth and my colleague Srimanth from Hortonworks.
  2. With increased adoption of Amabri throughout the enterprise the focus at the moment scale out to 1000s of node. With that in mind the focus of the talk is to demonstrate operations on a 2K node cluster with a glimpse at the future goals We will look at awesome features that are a part of 1.6.0 Along with things that truly identify Ambari as a platform, that is Views and Extensibility If you attend the birds of feather session tomorrow, we can do a further deep dive into these new development
  3. This slide the represents Ambari’s position in the Hadoop technology stack and highlights key integration points with services that are either Cloud compute providers or big data analytics platforms By the end of the talk you would get a fairly good idea of how Ambari enables the integration of these providers with the Hadoop eco-system
  4. Orchestrator: Ambari State machine combined with the Action scheduler and the Heartbeat handler Request Dispatcher: Service Provider interface and Resource provider layer Clusters / Stacks etc. are all resources from Ambari API standpoint Monitoring subsystem comprises of Ganglia as the metrics system and Nagios for the alerts
  5. Host Component isolation for Ambari Server, Ganglia and Nagios and Masters All testing done on VM’s on the cloud
  6. So now we are going to look at a video. The story here is: Let say you have sizeable cluster with need for additional compute capacity. And the new hardware that you intend to add needs to be configured differently from the existing cluster configuration. We begin by well looking at the dashboard that shows the 2000 Slave nodes and rest of the nice and customizable Ambari widgets Next step is to actually choose the groups of hosts that you want to customize. What we are doing here is grouping hosts together using Config groups and we give it a name. Lets select a few data nodes to demonstrate this. Note: Since this is a paid cluster and it expensive to keep it running so we are showing you a video. The Config group manager allows you to filter by Component and regular expressions, we make sure Datanode hosts are the only ones in the filter Next use an expression to choose hosts you want, here I just chose them at random. Now to actually making config changes Restart all will restart in one shot and apply the config The other option actually allows you to do rolling restart
  7. When rubber meets the road what do we see as the performance bottleneck: The monitoring and alerting subsystems on large clusters are bogged down by the amount of I/O operations to write relatively small amount of data at a high frequency to permanent storage. These numbers for iostats are close to when we began optimizing performance, as you can see we were writing at 1GB/min
  8. The most significant metric I would like to present is the load average improvements achieved through performance tuning effort It involved tuning the rrdcached daemon used to write ganglia data and also reading it back using Ambari API as well as Nagios Objective of this exercise is to certify Ambari with 2K nodes on run of the mill VMs with little to no optimization below the application stack and achieve acceptable performance for all management and monitoring operations. In theory it is possible to go above an beyond this magic number The goal is to actually scale to 10K+ nodes managed by single Ambari instance
  9. This is still a conceptual architecture and you can follow the discussion on the Apache Jira that is listed Quick word on the architecture, it involves scaling out of the collector daemon in proportion to cluster size The Views that you see in this picture will be part of the later slide deck and here to represent capability to extend Ambari for provide user interface of your choice to visualize data in Hadoop cluster
  10. Integrated with open source Quartz scheduler API to schedule batch of requests to be executed as per schedule
  11. Rolling restart is the first use case for request scheduling Schedule and go home.
  12. Host Configuration group is way of associating a set of configurations to a group of hosts per service This feature is supported with Blueprints as well, so the touch-less install can still incorporate heterogeneous target hosts
  13. - Additionally any custom property can be added to existing configuration
  14. - Selective application of changed configs and know exactly when and where to apply them.
  15. Blueprint as the name suggests is a declarative definition on the cluster which can be exported as a document from a live cluster or imported to create a new cluster from existing blueprint. Real word use cases: The Savanna project, Launchpad on Microsoft Azure
  16. Quick look at how to create a cluster using blueprint Define, Host Groups: Can be thought of as all unique set of components and configurations that represent hosts in you cluster with cardinality from 1 to N. Capture non-default configuration overrides Point to stack name and version to use When you POST to create a cluster you get back a request id that can be used to track progress of deployment
  17. Real world use case of blueprints HDP Launchpad for Azure (Linux) lets you spin up HDP clusters super easily - no need to for you to spin up VMs, create images, setup ssh etc. All you need is your Azure Account (with a credit card in good standing) to get started Once you get the launchpad going it will do *everything* for you and publish Ambari URL for control entry point.  Under the hood, after running some Azure provisioning and setup scripts, all the goodness coming from Ambari Blueprint
  18. When you manage a cluster of size 2000 nodes, you need ability to perform operations in bulk. Bulk host operations are now available on Hosts page Basically you identify which hosts – either all, filtered or selected Then you perform operations – either host level, or component level operations Components generally tend to be slaves/workers which are larger in number
  19. Component operations tend to perform operations in batches. For clusters with 2000 nodes you need good filters to easily find the appropriate hosts. Ambari provides 13 filters on its hosts page to help you.
  20. So lets say Hardware change/replacement on some nodes Experimenting with service configurations Turning off a service completely Deleting cluster nodes Maintenance Mode sliences alerts and skips operations.
  21. Inheritance cannot be turned off on lower levels
  22. We support safely moving the following master components from one host to another. Even the 2 namenodes in HDFS HA.
  23. Hadoop is an ecosystem with many services, many users and many many usecases. Even with all the functionality provided in Ambari, there will always be a different way to use and view your cluster. To allow users and admins to extend and contribute their own ‘view’ of the cluster, Ambari is providing the ‘Ambari Views’ framework. Developers can now create their ‘view’ using this framework. Gives users and administrators a single entry point into the cluster and allows for very interesting possibilities. Views also nicely complement stack extensibility on the backend, by providing appropriate views for them in the front end. Question: What is the admin functionality of views?
  24. This is Tech Preview being shown
  25. view.xml – view descriptor Web-inf/lib – 3rd party libraries Web-inf/web.xml – define custom servlets (non-REST) Classes – application logic Index.html/javascripts/… - UI
  26. View descriptor is the central entry point. Here you can see the view Id, display label you see in the menu, version of the view. Each JAR is for a version of the view. A view version can have many instances of the view. Each view can also define the parameters it needs to work – here you see list of cities this weather view needs. You also see a REST resource defined – all you need to implement is the Java bean and a JAX-RS annotated class. Each view can optionally define instances by default… here you see Europe. HDFS view does not have any instances because location of NameNode is a runtime value – not known at packaging time.
  27. Once view jar is place into Ambari, you can then see the views, versions and instances. You can create/update/delete view instances via calls. So if your 3rd party tool wants a view to HDFS, they can create instance and send user to link.
  28. Something that is being worked on is administration ability for views. Admins can configure views, provide entitlement for users, etc.
  29. So admins can control the cluster, and users can view the cluster and use it.
  30. In Hadoop 1.0 we visualized MapReduce jobs, their depdencies, and how the map and reduce tasks performed.
  31. In Hadoop 2.0 MapReduce has been made more generic in Apache Tez. Apache™ Tez generalizes the MapReduce paradigm to a more powerful framework for executing a complex DAG (directed acyclic graph) of tasks. As you can see Hive, Pig and other data processing services are being ported on top of Tez. For Hadoop 2.0 Ambari visualizes Hive queries using Tez engine.
  32. Each Hive + Tez query is shown in the jobs table. Going to an individual job shows the Tez DAG mixed in with Hive information.
  33. HDFS_ prefixed counters come from HDFS. They generally tend to be on first and last vertices of the DAG because that’s where they read and write from data. FILE_ prefixed counters are local disk accesses for the vertex… they represent data read/written during spilling. It does not represent data transferred between vertices. SPILLED_RECORDS – In Tez spilling of records can not only happen during vertex output (like MapReduce), but also at vertex input. For a vertex this number is for both. Tasks - FILE_BYTES_READ - FILE_BYTES_WRITTEN = spill bytes size (3 reads out of 3r+3w) local disk only. = does not include transporting across tasks = Read configs - HDFS_BYTES_READ|WRITTEN = Generally on first and last vertices where HDFS is accessed. - HDFS_READ_OPS = Listing directories (Direct HDFS counters) - HDFS_WRITE_OPS = FS changes (Direct HDFS counters) - create folder, concat file, mkdir, etc. - SPILLED_RECORDS = 3w+3r+1sort-w = Records in 3+1. - They occur in Output (when spilling locally when > memory) - They occur in Input (when collecting from multiple inputs) - If a vertex has both Input and Output - this will be sum of both.
  34. Summary metrics are shown for all vertices, so that you can compare relative performance of vertices. Tasks - FILE_BYTES_READ - FILE_BYTES_WRITTEN = spill bytes size (3 reads out of 3r+3w) local disk only. = does not include transporting across tasks = Read configs - HDFS_BYTES_READ|WRITTEN = Generally on first and last vertices where HDFS is accessed. - HDFS_READ_OPS = Listing directories (Direct HDFS counters) - HDFS_WRITE_OPS = FS changes (Direct HDFS counters) - create folder, concat file, mkdir, etc. - SPILLED_RECORDS = 3w+3r+1sort-w = Records in 3+1. - They occur in Output (when spilling locally when > memory) - They occur in Input (when collecting from multiple inputs) - If a vertex has both Input and Output - this will be sum of both.
  35. Hive and Tez have hooks to push notifications to ATS. Ambari pulls/GETs information from ATS. Other components plan to use ATS more – so Ambari should be able to show other types of Jobs.
  36. To enable Hive + Tez, admins should go to Hive configurations and set “hive.execution.engine” to “tez”. Default is “mr”. Other important tez configs are shown – like YARN container size etc for Hive+Tez queries.
  37. Jobs viewer can handle large queries. Like this one is approximately 70 Tez vertices 12 reduce vertices. The graph is more readable than the text above to analyze issues.
  38. - What truly identifies Ambari as a platform – Ability to add new services and manage and monitor a custom stack of components
  39. Stack is an all inclusive and self contained definition of all services and their life cycle within Ambari Let start by encapsulating components and configuration in a stack definition Next allow a developer to define component life cycle by declaring relationships between different states of a component REST API allows you to discover what is available Last plug it into Ambari to bring it all together
  40. Command scripts are way to tell Ambari what needs to be executed in order to achieve a state change, example, going from INSTALLED to STARTED entails executing a user defined start script of a component in the desired stack. Custom Commands and Custom Actions are similar to command scripts but independent of a state change and can be executed on demand using Ambari API, Example: Decommission Datanode, Run rebalancer, verify kerberos settings Extension makes it easy to add new stacks
  41. Command scripts are bundled with the server and downloaded to the agents. At registration time agents check to make sure the MD5 checksum of the downloaded script archive is the same on the server as in the agent cache, if not a agent downloads new definitions from the server. This makes on demand / on site modifications easy to change and verify.
  42. HBASE service definition in the stack The metrics.json files defines all metrics emitted by HBASE as well as how these metrics would show up in the Ambari API Contains configuration, package of command scripts and definition of the service in metainfo.xml Metainfo.xml: Link HBASE_MASTER component to the script which defines the life cycle commands (start, stop, install, configure) and custom commands if any Package: The actual command scripts which will be executed on the agents Example of a command script. Important to mention the python resource management framework of Ambari allows developer to extend a based class called Script and define a resources similar to other languages like puppet