SlideShare une entreprise Scribd logo
1  sur  38
Hadoop Operations –
Best Practices from the Field
June 11, 2015
Chris Nauroth
email: cnauroth@hortonworks.com
twitter: @cnauroth
Suresh Srinivas
email: suresh@hortonworks.com
twitter: @suresh_m_s
© Hortonworks Inc. 2011
About Me
Chris Nauroth
• Member of Technical Staff, Hortonworks
– Apache Hadoop committer, PMC member, and Apache Software Foundation member
– Major contributor to HDFS ACLs, Windows compatibility, and operability improvements
• Hadoop user since 2010
– Prior employment experience deploying, maintaining and using Hadoop clusters
Page 2
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Agenda
• Analysis of Hadoop Support Cases
– Support case trends
– Configuration
– Software Improvements
• Key Learnings and Best Practices
– HDFS ACLs
– HDFS Snapshots
– Reporting DataNode Volume Failures
Page 3
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Support Case Trends – Proportional Cases per Month
Page 4
Architecting the Future of Big Data
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
HDFS
Map Reduce
YARN
Other (37 components)
© Hortonworks Inc. 2011
Support Case Trends – Root Cause
Page 5
Architecting the Future of Big Data
0
200
400
600
800
1000
1200
Customer Environment
(Non HDP)
Documentation Defect Documentation Gap Documentation Not
Utilized
Education -
Configuration
Needs Training Product Defect
YARN
Map Reduce
HDFS
© Hortonworks Inc. 2011
Support Case Trends
• Core Hadoop components (HDFS, YARN and MapReduce) are used across all deployments, and
therefore receive proportionally more support cases than other ecosystem components.
• Misconfiguration is the dominant root cause.
• Documentation is a close second.
• We are constantly improving the code to eliminate operational issues, help with diagnosis and
provide increased visibility.
• Best practices get incorporated into Apache Ambari for improved defaults, simplified
configuration and deeper monitoring.
Page 6
Architecting the Future of Big Data
Configuration
© Hortonworks Inc. 2011
Configuration - Hardware and Cluster Sizing
• Considerations
–Larger clusters heal faster on nodes or disk failure
–Machines with huge storage take longer to recover
–More racks give more failure domains
• Recommendations
– Get good-quality commodity hardware
– Buy the sweet-spot in pricing: 3TB disk, 96GB, 8-12 cores
– More memory is better – real time is memory hungry!
– Before considering fatter machines (1U 6 disks vs. 2U 12 disks)
– Get to 30-40 machines or 3-4 racks
–Use pilot cluster to learn about load patterns
– Balanced hardware for I/O, compute or memory bound
– More details - http://tinyurl.com/hwx-hadoop-hw
Page 8
© Hortonworks Inc. 2011
Configuration – JVM Tuning
• Avoid JVM issues
– Use 64 bit JVM for all daemons
– Compressed OOPS enabled by default (6 u23 and later)
– Java heap size
– Set same max and starting heapsize, Xmx == Xms
– Avoid java defaults – configure NewSize and MaxNewSize
– Use 1/8 to 1/6 of max size for JVMs larger than 4G
– Configure –XX:PermSize=128 MB, -XX:MaxPermSize=256 MB
– Use low-latency GC collector
– -XX:+UseConcMarkSweepGC, -XX:ParallelGCThreads=<N>
– High <N> on Namenode and JobTracker or ResourceManager
– Important JVM configs to help debugging
– -verbose:gc -Xloggc:<file> -XX:+PrintGCDetails
– -XX:ErrorFile=<file>
– -XX:+HeapDumpOnOutOfMemoryError
Page 9
© Hortonworks Inc. 2011
Configuration
• Deploy with QuorumJournalManager for high availability
• Configure open fd ulimit
– Default 1024 is too low
– 16K for datanodes, 64K for Master nodes
• Use version control for configuration!
Page 10
© Hortonworks Inc. 2011
Configuration
• Use disk fail in place for datanodes: dfs.datanode.failed.volumes.tolerated
– Disk failure is no longer datanode failure
– Especially important for large density nodes
• Set dfs.namenode.name.dir.restore to true
– Restores NN storage directory during checkpointing
• Take periodic backups of namenode metadata
– Make copies of the entire storage directory
• Set aside a lot of disk space for NN logs
– It is verbose – set aside multiple GBs
– Many installs configure this too small
– NN logs roll with in minutes – hard to debug issues
Page 11
© Hortonworks Inc. 2011
Configuration – Monitoring Usage
• Cluster storage, nodes, files, blocks grows
– Update NN heap, handler count, number of DN xceivers
– Tweak other related config periodically
• Monitor the hardware usage for your work load
– Disk I/O, network I/O, CPU and memory usage
– Use this information when expanding cluster capacity
• Monitor the usage with HADOOP metrics
– JVM metrics – GC times, Memory used, Thread Status
– RPC metrics – especially latency to track slowdowns
– HDFS metrics
– Used storage, # of files and blocks, total load on the cluster
– File System operations
– MapReduce Metrics
– Slot utilization and Job status
• Tweak configurations during upgrades/maintenance on an ongoing basis
Page 12
HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Install & Configure: Ambari Guided Configuration
Guide configuration and provide
recommendations for the most
common settings.
(HBase Example Shown here)
Software Improvements
Real Incidents and Software Improvements to Address Them
© Hortonworks Inc. 2011
Don’t edit the metadata files!
• Editing can corrupt the cluster state
– Might result in loss of data
• Real incident
– NN misconfigured to point to another NN’s metadata
– DNs can’t register due to namespace ID mismatch
– System detected the problem correctly
– Safety net ignored by the admin!
– Admin edits the namenode VERSION file to match ids
Mass deletion of unknown blocks that do not
exist in that namespace
Page 15
© Hortonworks Inc. 2011
Improvement
• Pause deletion of blocks when the namenode starts up
– https://issues.apache.org/jira/browse/HDFS-6186
– Supports configurable delay of block deletions after NameNode startup
– Gives an admin extra time to diagnose before deletions begin
• Show when block deletion will start after NameNode startup in WebUI
– https://issues.apache.org/jira/browse/HDFS-6385
– The web UI already displayed the number of pending block deletions
– This enhanced the display to indicate when actual deletion will begin
Page 16
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Block Deletion Start Time
Page 17
Architecting the Future of Big Data
New
© Hortonworks Inc. 2011
Guard Against Accidental Deletion
• rm –r deletes the data at the speed of Hadoop!
– ctrl-c of the command does not stop deletion!
– Undeleting files on datanodes is hard & time consuming
– Immediately shutdown NN, unmount disks on datanodes
– Recover deleted files
– Start namenode without the delete operation in edits
• Enable Trash
• Real Incident
– Customer is running a distro of Hadoop with trash not enabled
– Deletes a large dir (100 TB) and shuts down NN immediately
– Support person asks NN to be restarted to see if trash is enabled!
Blocks start deleting
Page 18
© Hortonworks Inc. 2011
Improvement
• HDFS Snapshots
– https://issues.apache.org/jira/browse/HDFS-2802
– A snapshot is a read-only point-in-time image of part of the file system
– A snapshot created before a deletion can be used to restore deleted data
– More coverage of snapshots later in the presentation
• HDFS ACLs
– https://issues.apache.org/jira/browse/HDFS-4685
– Finer-grained control of file permissions can help prevent an accidental deletion
– More coverage of ACLs later in the presentation
Page 19
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Unexpected error during HA HDFS upgrade
• Background: HDFS HA Architecture
– http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
• Real Incident
– During upgrade, NameNode calls every JournalNode to request backup of metadata directory, which renames
“current” directory to “previous.tmp”.
– Permissions incorrect on metadata directory for 1 out of 3 JournalNodes.
– The hdfs user is not authorized to rename. Backup fails for that JournalNode, so upgrade process aborts with
error.
Root cause not easily identifiable, long time to
recover
Page 20
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Improvement
• Improve diagnostics on storage directory rename operations by using native code.
– https://issues.apache.org/jira/browse/HDFS-7118
– Logs additional root cause information for rename failure. For example, EACCES
• Split error checks in into separate conditions to improve diagnostics.
– https://issues.apache.org/jira/browse/HDFS-7119
– Splits a log message about failure to delete or rename into separate log messages to clarify which specific action
failed
• When aborting NameNode or JournalNode, write the contents of the metadata directories and
permissions to logs.
– https://issues.apache.org/jira/browse/HDFS-7120
– Usually the first information asked of the user, so we can automate this
• For JournalNode operations that must succeed on all nodes, execute a pre-check to verify that
the operation can succeed.
– https://issues.apache.org/jira/browse/HDFS-7121
– Prevents need for manual cleanup on 2 out of 3 JournalNodes where backup succeeded
Page 21
Architecting the Future of Big Data
Key Learnings and Best Practices
Features that Help Improve Production Operations
© Hortonworks Inc. 2011
HDFS ACLs
• Existing HDFS POSIX permissions good, but not flexible enough
– Permission requirements may differ from the natural organizational hierarchy of users and groups.
• HDFS ACLs augment the existing HDFS POSIX permissions model by implementing the POSIX
ACL model.
– An ACL (Access Control List) provides a way to set different permissions for specific named users or named
groups, not only the file’s owner and file’s group.
Page 23
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS File Permissions Example
• Authorization requirements:
–In a sales department, they would like a single user Maya (Department Manager) to
control all modifications to sales data
–Other members of sales department need to view the data, but can’t modify it.
–Everyone else in the company must not be allowed to view the data.
• Can be implemented via the following:
Read/Write perm for user
maya
User
Group
Read perm for group sales
File with sales data
© Hortonworks Inc. 2011
HDFS ACLs
• Problem
–No longer feasible for Maya to control all modifications to the file
– New Requirement: Maya, Diane and Clark are allowed to make modifications
– New Requirement: New group called executives should be able to read the sales data
–Current permissions model only allows permissions at 1 group and 1 user
• Solution: HDFS ACLs
–Now assign different permissions to different users and groups
Owner
Group
Others
HDFS
Directory
… rwx
… rwx
… rwx
Group D … rwx
Group F … rwx
User Y … rwx
© Hortonworks Inc. 2011
HDFS ACLs
New Tools for ACL Management (setfacl, getfacl)
– hdfs dfs -setfacl -m group:execs:r-- /sales-data
– hdfs dfs -getfacl /sales-data # file: /sales-data # owner: maya # group:
sales user::rw- group::r-- group:execs:r-- mask::r-- other::--
– How do you know if a directory has ACLs set?
– hdfs dfs -ls /sales-data Found 1 items -rw-r-----+ 3 maya sales 0
2014-03-04 16:31 /sales-data
© Hortonworks Inc. 2011
HDFS ACLs Best Practices
• Start with traditional HDFS permissions to implement most permission requirements.
• Define a smaller number of ACLs to handle exceptional cases.
• A file with an ACL incurs an additional cost in memory in the NameNode compared to a file that
has only traditional permissions.
Page 27
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS Snapshots
• HDFS Snapshots
– A snapshot is a read-only point-in-time image of part of the file system
– Performance: snapshot creation is instantaneous, regardless of data size or subtree depth
– Reliability: snapshot creation is atomic
– Scalability: snapshots do not create extra copies of data blocks
– Useful for protecting against accidental deletion of data
• Example: Daily Feeds
hdfs dfs -ls /daily-feeds
Found 5 items
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-13
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-14
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-15
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-16
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-17
Page 28
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS Snapshots
• Create a snapshot after each daily load
hdfs dfsadmin -allowSnapshot /daily-feeds
Allowing snaphot on /daily-feeds succeeded
hdfs dfs -createSnapshot /daily-feeds snapshot-to-2014-10-17
Created snapshot /daily-feeds/.snapshot/snapshot-to-2014-10-17
• User accidentally deletes data for 2014-10-16
hdfs dfs -ls /daily-feeds
Found 4 items
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-13
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-14
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-15
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-17
Page 29
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS Snapshots
• Snapshots to the rescue: the data is still in the snapshot
hdfs dfs -ls /daily-feeds/.snapshot/snapshot-to-2014-10-17
Found 5 items
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-
feeds/.snapshot/snapshot-to-2014-10-17/2014-10-13
drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-
feeds/.snapshot/snapshot-to-2014-10-17/2014-10-14
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-
feeds/.snapshot/snapshot-to-2014-10-17/2014-10-15
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-
feeds/.snapshot/snapshot-to-2014-10-17/2014-10-16
drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-
feeds/.snapshot/snapshot-to-2014-10-17/2014-10-17
• Restore data from 2014-10-16
hdfs dfs -cp /daily-feeds/.snapshot/snapshot-to-2014-10-17/2014-10-16 /daily-feeds
Page 30
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Reporting DataNode Volume Failures
• Configuring dfs.datanode.failed.volumes.tolerated > 0 enables a DataNode to keep running after
volume failures
• DataNode is still running, but capacity is degraded
• HDFS already provided a count of failed volumes for each DataNode, but no further details
• Apache Hadoop 2.7.0 provides more information: failed path, estimated lost capacity and failure
date/time
• An administrator can use this information to prioritize cluster maintenance work
Page 31
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Reporting DataNode Volume Failures
Page 32
Architecting the Future of Big Data
New
© Hortonworks Inc. 2011
Reporting DataNode Volume Failures
Page 33
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Reporting DataNode Volume Failures
Page 34
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Reporting DataNode Volume Failures
• Everything in the web UI is sourced from standardized Hadoop metrics
– Each DataNode publishes its own metrics
– NameNode publishes aggregate information from every DataNode
• Metrics accessible through JMX or the HTTP /jmx URI
• Integrated in Ambari
• Can be integrated into your preferred management tools and ops dashboards
Page 35
Architecting the Future of Big Data
New System to Manage the Health of Hadoop
Clusters
• Ambari Alerts are installed and configured by default
• Health Alerts and Metrics managed via Ambari Web
© Hortonworks Inc. 2011
Summary
• Configuration
– Prevent garbage collection issues
– Configure for redundancy
– Retune configuration in response to metrics
• HDFS ACLs
– Implement fine-grained authorization rules on files
– Can protect against accidental file manipulations
• HDFS Snapshots
– Point-in-time image of part of the filesystem
– Useful for restoring to a prior state after accidental file manipulation
• Reporting DataNode Volume Failures
– Metrics and web UI exposing information about volume failures on DataNodes
– Useful for planning cluster maintenance work
• Use Ambari
– Helps install, configure, monitor and manage Hadoop clusters
– Incorporates the latest best practices
Page 37
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Thank you, Q&A
Resource Location
Hardware
Recommendations for
Apache Hadoop
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_cluster-planning-
guide/content/ch_hardware-recommendations.html
HDFS operational and
debuggability
improvements
https://issues.apache.org/jira/browse/HDFS-6185
HDFS ACLs Blog Post http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/
HDFS Snapshots Blog Post http://hortonworks.com/blog/protecting-your-enterprise-data-with-hdfs-snapshots/
Learn more
Contact me with your operations questions and suggestions
Chris Nauroth – cnauroth@hortonworks.com

Contenu connexe

Tendances

Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...DataWorks Summit
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopDataWorks Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Chris Nauroth
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitDataWorks Summit
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hiverxu
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High PerformanceInderaj (Raj) Bains
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemDataWorks Summit/Hadoop Summit
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 

Tendances (20)

Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 

En vedette

Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaLucidworks
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.ashish0x90
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchMark Miller
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and TestingMark Miller
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchCloudera, Inc.
 

En vedette (10)

Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and Testing
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data Search
 

Similaire à Hadoop operations-2015-hadoop-summit-san-jose-v5

Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Chris Nauroth
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2hdhappy001
 
Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryChris Nauroth
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Chris Nauroth
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Perforce
 
Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory StorageDataWorks Summit
 
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopHortonworks
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...SpringPeople
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopMike Pittaro
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Etu Solution
 
Establishing Environment Best Practices T12 Brendan Law
Establishing Environment Best Practices T12 Brendan LawEstablishing Environment Best Practices T12 Brendan Law
Establishing Environment Best Practices T12 Brendan LawFlamer
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsLetsConnect
 
Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01Arunkumar Shanmugam
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hivesrikanthhadoop
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY
 

Similaire à Hadoop operations-2015-hadoop-summit-san-jose-v5 (20)

Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
 
Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and Memory
 
HDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and SupportabilityHDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and Supportability
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory Storage
 
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
Establishing Environment Best Practices T12 Brendan Law
Establishing Environment Best Practices T12 Brendan LawEstablishing Environment Best Practices T12 Brendan Law
Establishing Environment Best Practices T12 Brendan Law
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM Connections
 
Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01Backup netezza-tsm-v1403c-140330170451-phpapp01
Backup netezza-tsm-v1403c-140330170451-phpapp01
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hive
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 

Dernier

Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 

Dernier (20)

Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 

Hadoop operations-2015-hadoop-summit-san-jose-v5

  • 1. Hadoop Operations – Best Practices from the Field June 11, 2015 Chris Nauroth email: cnauroth@hortonworks.com twitter: @cnauroth Suresh Srinivas email: suresh@hortonworks.com twitter: @suresh_m_s
  • 2. © Hortonworks Inc. 2011 About Me Chris Nauroth • Member of Technical Staff, Hortonworks – Apache Hadoop committer, PMC member, and Apache Software Foundation member – Major contributor to HDFS ACLs, Windows compatibility, and operability improvements • Hadoop user since 2010 – Prior employment experience deploying, maintaining and using Hadoop clusters Page 2 Architecting the Future of Big Data
  • 3. © Hortonworks Inc. 2011 Agenda • Analysis of Hadoop Support Cases – Support case trends – Configuration – Software Improvements • Key Learnings and Best Practices – HDFS ACLs – HDFS Snapshots – Reporting DataNode Volume Failures Page 3 Architecting the Future of Big Data
  • 4. © Hortonworks Inc. 2011 Support Case Trends – Proportional Cases per Month Page 4 Architecting the Future of Big Data 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 HDFS Map Reduce YARN Other (37 components)
  • 5. © Hortonworks Inc. 2011 Support Case Trends – Root Cause Page 5 Architecting the Future of Big Data 0 200 400 600 800 1000 1200 Customer Environment (Non HDP) Documentation Defect Documentation Gap Documentation Not Utilized Education - Configuration Needs Training Product Defect YARN Map Reduce HDFS
  • 6. © Hortonworks Inc. 2011 Support Case Trends • Core Hadoop components (HDFS, YARN and MapReduce) are used across all deployments, and therefore receive proportionally more support cases than other ecosystem components. • Misconfiguration is the dominant root cause. • Documentation is a close second. • We are constantly improving the code to eliminate operational issues, help with diagnosis and provide increased visibility. • Best practices get incorporated into Apache Ambari for improved defaults, simplified configuration and deeper monitoring. Page 6 Architecting the Future of Big Data
  • 8. © Hortonworks Inc. 2011 Configuration - Hardware and Cluster Sizing • Considerations –Larger clusters heal faster on nodes or disk failure –Machines with huge storage take longer to recover –More racks give more failure domains • Recommendations – Get good-quality commodity hardware – Buy the sweet-spot in pricing: 3TB disk, 96GB, 8-12 cores – More memory is better – real time is memory hungry! – Before considering fatter machines (1U 6 disks vs. 2U 12 disks) – Get to 30-40 machines or 3-4 racks –Use pilot cluster to learn about load patterns – Balanced hardware for I/O, compute or memory bound – More details - http://tinyurl.com/hwx-hadoop-hw Page 8
  • 9. © Hortonworks Inc. 2011 Configuration – JVM Tuning • Avoid JVM issues – Use 64 bit JVM for all daemons – Compressed OOPS enabled by default (6 u23 and later) – Java heap size – Set same max and starting heapsize, Xmx == Xms – Avoid java defaults – configure NewSize and MaxNewSize – Use 1/8 to 1/6 of max size for JVMs larger than 4G – Configure –XX:PermSize=128 MB, -XX:MaxPermSize=256 MB – Use low-latency GC collector – -XX:+UseConcMarkSweepGC, -XX:ParallelGCThreads=<N> – High <N> on Namenode and JobTracker or ResourceManager – Important JVM configs to help debugging – -verbose:gc -Xloggc:<file> -XX:+PrintGCDetails – -XX:ErrorFile=<file> – -XX:+HeapDumpOnOutOfMemoryError Page 9
  • 10. © Hortonworks Inc. 2011 Configuration • Deploy with QuorumJournalManager for high availability • Configure open fd ulimit – Default 1024 is too low – 16K for datanodes, 64K for Master nodes • Use version control for configuration! Page 10
  • 11. © Hortonworks Inc. 2011 Configuration • Use disk fail in place for datanodes: dfs.datanode.failed.volumes.tolerated – Disk failure is no longer datanode failure – Especially important for large density nodes • Set dfs.namenode.name.dir.restore to true – Restores NN storage directory during checkpointing • Take periodic backups of namenode metadata – Make copies of the entire storage directory • Set aside a lot of disk space for NN logs – It is verbose – set aside multiple GBs – Many installs configure this too small – NN logs roll with in minutes – hard to debug issues Page 11
  • 12. © Hortonworks Inc. 2011 Configuration – Monitoring Usage • Cluster storage, nodes, files, blocks grows – Update NN heap, handler count, number of DN xceivers – Tweak other related config periodically • Monitor the hardware usage for your work load – Disk I/O, network I/O, CPU and memory usage – Use this information when expanding cluster capacity • Monitor the usage with HADOOP metrics – JVM metrics – GC times, Memory used, Thread Status – RPC metrics – especially latency to track slowdowns – HDFS metrics – Used storage, # of files and blocks, total load on the cluster – File System operations – MapReduce Metrics – Slot utilization and Job status • Tweak configurations during upgrades/maintenance on an ongoing basis Page 12
  • 13. HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Install & Configure: Ambari Guided Configuration Guide configuration and provide recommendations for the most common settings. (HBase Example Shown here)
  • 14. Software Improvements Real Incidents and Software Improvements to Address Them
  • 15. © Hortonworks Inc. 2011 Don’t edit the metadata files! • Editing can corrupt the cluster state – Might result in loss of data • Real incident – NN misconfigured to point to another NN’s metadata – DNs can’t register due to namespace ID mismatch – System detected the problem correctly – Safety net ignored by the admin! – Admin edits the namenode VERSION file to match ids Mass deletion of unknown blocks that do not exist in that namespace Page 15
  • 16. © Hortonworks Inc. 2011 Improvement • Pause deletion of blocks when the namenode starts up – https://issues.apache.org/jira/browse/HDFS-6186 – Supports configurable delay of block deletions after NameNode startup – Gives an admin extra time to diagnose before deletions begin • Show when block deletion will start after NameNode startup in WebUI – https://issues.apache.org/jira/browse/HDFS-6385 – The web UI already displayed the number of pending block deletions – This enhanced the display to indicate when actual deletion will begin Page 16 Architecting the Future of Big Data
  • 17. © Hortonworks Inc. 2011 Block Deletion Start Time Page 17 Architecting the Future of Big Data New
  • 18. © Hortonworks Inc. 2011 Guard Against Accidental Deletion • rm –r deletes the data at the speed of Hadoop! – ctrl-c of the command does not stop deletion! – Undeleting files on datanodes is hard & time consuming – Immediately shutdown NN, unmount disks on datanodes – Recover deleted files – Start namenode without the delete operation in edits • Enable Trash • Real Incident – Customer is running a distro of Hadoop with trash not enabled – Deletes a large dir (100 TB) and shuts down NN immediately – Support person asks NN to be restarted to see if trash is enabled! Blocks start deleting Page 18
  • 19. © Hortonworks Inc. 2011 Improvement • HDFS Snapshots – https://issues.apache.org/jira/browse/HDFS-2802 – A snapshot is a read-only point-in-time image of part of the file system – A snapshot created before a deletion can be used to restore deleted data – More coverage of snapshots later in the presentation • HDFS ACLs – https://issues.apache.org/jira/browse/HDFS-4685 – Finer-grained control of file permissions can help prevent an accidental deletion – More coverage of ACLs later in the presentation Page 19 Architecting the Future of Big Data
  • 20. © Hortonworks Inc. 2011 Unexpected error during HA HDFS upgrade • Background: HDFS HA Architecture – http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html • Real Incident – During upgrade, NameNode calls every JournalNode to request backup of metadata directory, which renames “current” directory to “previous.tmp”. – Permissions incorrect on metadata directory for 1 out of 3 JournalNodes. – The hdfs user is not authorized to rename. Backup fails for that JournalNode, so upgrade process aborts with error. Root cause not easily identifiable, long time to recover Page 20 Architecting the Future of Big Data
  • 21. © Hortonworks Inc. 2011 Improvement • Improve diagnostics on storage directory rename operations by using native code. – https://issues.apache.org/jira/browse/HDFS-7118 – Logs additional root cause information for rename failure. For example, EACCES • Split error checks in into separate conditions to improve diagnostics. – https://issues.apache.org/jira/browse/HDFS-7119 – Splits a log message about failure to delete or rename into separate log messages to clarify which specific action failed • When aborting NameNode or JournalNode, write the contents of the metadata directories and permissions to logs. – https://issues.apache.org/jira/browse/HDFS-7120 – Usually the first information asked of the user, so we can automate this • For JournalNode operations that must succeed on all nodes, execute a pre-check to verify that the operation can succeed. – https://issues.apache.org/jira/browse/HDFS-7121 – Prevents need for manual cleanup on 2 out of 3 JournalNodes where backup succeeded Page 21 Architecting the Future of Big Data
  • 22. Key Learnings and Best Practices Features that Help Improve Production Operations
  • 23. © Hortonworks Inc. 2011 HDFS ACLs • Existing HDFS POSIX permissions good, but not flexible enough – Permission requirements may differ from the natural organizational hierarchy of users and groups. • HDFS ACLs augment the existing HDFS POSIX permissions model by implementing the POSIX ACL model. – An ACL (Access Control List) provides a way to set different permissions for specific named users or named groups, not only the file’s owner and file’s group. Page 23 Architecting the Future of Big Data
  • 24. © Hortonworks Inc. 2011 HDFS File Permissions Example • Authorization requirements: –In a sales department, they would like a single user Maya (Department Manager) to control all modifications to sales data –Other members of sales department need to view the data, but can’t modify it. –Everyone else in the company must not be allowed to view the data. • Can be implemented via the following: Read/Write perm for user maya User Group Read perm for group sales File with sales data
  • 25. © Hortonworks Inc. 2011 HDFS ACLs • Problem –No longer feasible for Maya to control all modifications to the file – New Requirement: Maya, Diane and Clark are allowed to make modifications – New Requirement: New group called executives should be able to read the sales data –Current permissions model only allows permissions at 1 group and 1 user • Solution: HDFS ACLs –Now assign different permissions to different users and groups Owner Group Others HDFS Directory … rwx … rwx … rwx Group D … rwx Group F … rwx User Y … rwx
  • 26. © Hortonworks Inc. 2011 HDFS ACLs New Tools for ACL Management (setfacl, getfacl) – hdfs dfs -setfacl -m group:execs:r-- /sales-data – hdfs dfs -getfacl /sales-data # file: /sales-data # owner: maya # group: sales user::rw- group::r-- group:execs:r-- mask::r-- other::-- – How do you know if a directory has ACLs set? – hdfs dfs -ls /sales-data Found 1 items -rw-r-----+ 3 maya sales 0 2014-03-04 16:31 /sales-data
  • 27. © Hortonworks Inc. 2011 HDFS ACLs Best Practices • Start with traditional HDFS permissions to implement most permission requirements. • Define a smaller number of ACLs to handle exceptional cases. • A file with an ACL incurs an additional cost in memory in the NameNode compared to a file that has only traditional permissions. Page 27 Architecting the Future of Big Data
  • 28. © Hortonworks Inc. 2011 HDFS Snapshots • HDFS Snapshots – A snapshot is a read-only point-in-time image of part of the file system – Performance: snapshot creation is instantaneous, regardless of data size or subtree depth – Reliability: snapshot creation is atomic – Scalability: snapshots do not create extra copies of data blocks – Useful for protecting against accidental deletion of data • Example: Daily Feeds hdfs dfs -ls /daily-feeds Found 5 items drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-13 drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-14 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-15 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-16 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-17 Page 28 Architecting the Future of Big Data
  • 29. © Hortonworks Inc. 2011 HDFS Snapshots • Create a snapshot after each daily load hdfs dfsadmin -allowSnapshot /daily-feeds Allowing snaphot on /daily-feeds succeeded hdfs dfs -createSnapshot /daily-feeds snapshot-to-2014-10-17 Created snapshot /daily-feeds/.snapshot/snapshot-to-2014-10-17 • User accidentally deletes data for 2014-10-16 hdfs dfs -ls /daily-feeds Found 4 items drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-13 drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily-feeds/2014-10-14 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-15 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily-feeds/2014-10-17 Page 29 Architecting the Future of Big Data
  • 30. © Hortonworks Inc. 2011 HDFS Snapshots • Snapshots to the rescue: the data is still in the snapshot hdfs dfs -ls /daily-feeds/.snapshot/snapshot-to-2014-10-17 Found 5 items drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily- feeds/.snapshot/snapshot-to-2014-10-17/2014-10-13 drwxr-xr-x - chris supergroup 0 2014-10-13 14:36 /daily- feeds/.snapshot/snapshot-to-2014-10-17/2014-10-14 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily- feeds/.snapshot/snapshot-to-2014-10-17/2014-10-15 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily- feeds/.snapshot/snapshot-to-2014-10-17/2014-10-16 drwxr-xr-x - chris supergroup 0 2014-10-13 14:37 /daily- feeds/.snapshot/snapshot-to-2014-10-17/2014-10-17 • Restore data from 2014-10-16 hdfs dfs -cp /daily-feeds/.snapshot/snapshot-to-2014-10-17/2014-10-16 /daily-feeds Page 30 Architecting the Future of Big Data
  • 31. © Hortonworks Inc. 2011 Reporting DataNode Volume Failures • Configuring dfs.datanode.failed.volumes.tolerated > 0 enables a DataNode to keep running after volume failures • DataNode is still running, but capacity is degraded • HDFS already provided a count of failed volumes for each DataNode, but no further details • Apache Hadoop 2.7.0 provides more information: failed path, estimated lost capacity and failure date/time • An administrator can use this information to prioritize cluster maintenance work Page 31 Architecting the Future of Big Data
  • 32. © Hortonworks Inc. 2011 Reporting DataNode Volume Failures Page 32 Architecting the Future of Big Data New
  • 33. © Hortonworks Inc. 2011 Reporting DataNode Volume Failures Page 33 Architecting the Future of Big Data
  • 34. © Hortonworks Inc. 2011 Reporting DataNode Volume Failures Page 34 Architecting the Future of Big Data
  • 35. © Hortonworks Inc. 2011 Reporting DataNode Volume Failures • Everything in the web UI is sourced from standardized Hadoop metrics – Each DataNode publishes its own metrics – NameNode publishes aggregate information from every DataNode • Metrics accessible through JMX or the HTTP /jmx URI • Integrated in Ambari • Can be integrated into your preferred management tools and ops dashboards Page 35 Architecting the Future of Big Data
  • 36. New System to Manage the Health of Hadoop Clusters • Ambari Alerts are installed and configured by default • Health Alerts and Metrics managed via Ambari Web
  • 37. © Hortonworks Inc. 2011 Summary • Configuration – Prevent garbage collection issues – Configure for redundancy – Retune configuration in response to metrics • HDFS ACLs – Implement fine-grained authorization rules on files – Can protect against accidental file manipulations • HDFS Snapshots – Point-in-time image of part of the filesystem – Useful for restoring to a prior state after accidental file manipulation • Reporting DataNode Volume Failures – Metrics and web UI exposing information about volume failures on DataNodes – Useful for planning cluster maintenance work • Use Ambari – Helps install, configure, monitor and manage Hadoop clusters – Incorporates the latest best practices Page 37 Architecting the Future of Big Data
  • 38. © Hortonworks Inc. 2011 Thank you, Q&A Resource Location Hardware Recommendations for Apache Hadoop http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_cluster-planning- guide/content/ch_hardware-recommendations.html HDFS operational and debuggability improvements https://issues.apache.org/jira/browse/HDFS-6185 HDFS ACLs Blog Post http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/ HDFS Snapshots Blog Post http://hortonworks.com/blog/protecting-your-enterprise-data-with-hdfs-snapshots/ Learn more Contact me with your operations questions and suggestions Chris Nauroth – cnauroth@hortonworks.com

Notes de l'éditeur

  1. First, a quick introduction. My name is Chris Nauroth. I’m a software engineer on the HDFS team at Hortonworks. I’m an Apache Hadoop committer and PMC member. I’m also an Apache Software Foundation member. Some of my major contributions include HDFS ACLs, Windows compatibility and various operability improvements. Prior to Hortonworks, I worked for Disney and did an initial deployment of Hadoop there. As part of that job, I worked very closely with the systems engineering team responsible for maintaining those Hadoop clusters, so I tend to think back to that team and get excited about things I can do now as a software engineer to help make that team’s job easier. I’m also here with Suresh Srinivas, one of the founders of Hortonworks, and a long-time Hadoop committer and PMC member. He has a lot of experience supporting some of the world’s largest clusters at Yahoo and elsewhere. Together with Suresh, we have experience supporting Hadoop clusters since 2008.
  2. For today’s agenda, I’d like to start by sharing some analysis that we’ve done of support case trends. In that analysis, we’re going to see that some common patterns emerge, and that’s going to lead into a discussion of configuration best practices and software improvements. In the second half of the talk, we’ll move into a discussion of key learnings and best practices around how recent HDFS features can help prevent problems or manage day-to-day maintenance.
  3. Let’s dive into the support case analysis. The data source for this chart is the entire history of support cases at Hortonworks. The x-axis is month and the y-axis is the proportion of support cases reported against a specific component. The chart focuses on 3 components that we define as the core of Hadoop: HDFS, YARN and MapReduce. All other components in the ecosystem are collapsed into a single line. Here we see a trend stabilizing around 30% of support cases driven from those core components. It also makes sense intuitively that a large proportion of support cases are driven from those core components, because every deployment uses them. As you rise up the stack, deployments start to vary in the components they choose to deploy. For example, a deployment may or may not deploy Hbase depending on its use cases.
  4. The second chart shows an analysis of root cause category in each of those 3 core components. The source data contains many additional root cause categories. I’ve chosen to prune this down to the most significant ones to simplify the chart. The pattern that we see here is that a lot of support cases are driven by configuration issues or documentation problems. On an interesting side note, I gave a version of this presentation last year at Strata, and since then I’ve refreshed these charts with current data. Something I noticed is that documentation, configuration and software defects are propotionally a little bit smaller than last time. We’ve been investing a lot of energy in these areas, so it was satisfying to see the data showing that those efforts have been somewhat successful.
  5. Investment in operations at the core helps the most users.
  6. With that, let’s move into a discussion of common configuration issues that we continue to see.
  7. Fewer nodes is less resilient than many nodes. Failure of a DataNode that’s heavier on storage causes more re-replication activity. Map Reduce jobs may need to rerun more tasks. Commodity != poor quality.
  8. Compressed ordinary object pointers are a technique used in the JVM to represent managed pointers as 32-bit offsets from a 64-bit base heap address. This saves on the space taken by 64-bit native pointers. We used to have a recommendation to pass a JVM argument to turn this on. Recent JVM versions just use it by default. Xmx different from Xms can cause big expensive malloc. Surprising results when you run out of memory late in the process lifetime. N=8 typically. Oom-killer.
  9. NameNode high availability was a very hot topic a few years ago. At this point, the recommended HA architecture is to use QuorumJournalManager, which sets up an active-standby pair of NameNodes and offloads edit logging to a separate set of daemons called the JournalNodes. On a side note, version control for configuration is a good thing. It can be helpful to look back on the history of changes or restore to a last known good state.
  10. The DataNode has a feature called disk-fail-in-place that allows it to keep running even if individual volumes have failed. This is off by default, but you can turn it on by editing hdfs-site.xml and setting property dfs.datanode.failed.volumes.tolerated to the number of volumes that you tolerate failing before shutting down the entire DataNode. This is useful for large-density nodes, meaning nodes that have a lot of disks. If you have a node with 16 disks, and 2 disks fail, you’d probably prefer to keep that DataNode running with 14 disks available to serve clients instead of shutting down the whole thing. dfs.namenode.name.dir.restore is a property that controls whether or not the NameNode should attempt to bring back into service metadata storage directories that previously failed. By turning this on, you have the ability to repair a failed directory online and bring it back into service without restarting the NameNode process. We recommend taking periodic backups of the NameNode metadata. Copy the entire storage directory. Also plan on reserving a lot of disk for NameNode logs. A common pitfall is choosing too little space for logs, which then forces you to configure Log4J to roll logs very rapidly, and this can make debugging harder.
  11. Something to keep in mind that usage patterns on a cluster tend to change over time as use cases change. Configuration may need to change in reaction to changing usage patterns. If you have a major upgrade or maintenance planned, then that’s a good opportunity to review configurations and see if anything else needs to change.
  12. Increasingly, we’re pushing configuration best practices into the implementation of Ambari. This takes the burden off of administrators to remember these best practices during deployments. For those who don’t know, Apache Ambari is an open source cluster deployment and management tool. For a little variety, I chose to pull a screenshot related to HBase. Here we can see that Ambari starts by recommending some good defaults, but still gives administrators the option to tune settings to match their specific needs.
  13. Next, I’d like to discuss a few software improvements that were prompted by our experiences in support cases. We’ve found that often very small code changes can have a big impact on preventing problems or recovering from them. I’m going to discuss some real incidents that we’ve seen and how they led us to make those code changes.
  14. First, a public service announcement: don’t edit the metadata files. The NameNode metadata files are crucial for maintaining the state of the file system, so editing them can corrupt cluster state and result in loss of data. Don’t edit them. Now that I’ve said that, let’s talk about editing the metadata files. This is a real incident. A NameNode was misconfigured to point to the metadata from a different NameNode. An important note here is that part of the NameNode metadata is a namespace ID, which uniquely identifies that file system namespace. When DataNodes register with a NameNode for the first time, they also acquire that namespace ID and persist it locally. On subsequent DataNode restarts, the NameNode has a check that the DataNode attempting to register with it is presenting the same namespace ID. After NameNode restart, the DataNodes could not register with the NameNode because of the namespace ID mismatch. The system detected the problem correctly, and so far everything is working as designed. However, the admin thought an appropriate fix would be to manually edit the VERSION file, which is the part of the metadata containing the namespace ID, and change it to match what the DataNodes were reporting. “What happens next?” The problem is that the NameNode’s fsimage also persists the block IDs that are known for each file. When these DataNodes from a different cluster started sending their block reports, the NameNode replied by saying these blocks do not exist in my namespace, and therefore they should be deleted.
  15. This is the HDFS web UI, now with a small enhancement to show the time when block deletions will start.
  16. HDFS is known for being a scalable system. One of the things it’s really awesome at is scaling deletes! This can be a scary situation if someone deletes the wrong thing, because attempting to recover by undeleting block files is error-prone and time-consuming work across all DataNodes. We recommend enabling the HDFS trash feature as a safety net, which essentially changes deletes into renames, and the NameNode can then reap the trash files at a later time. However, I’m going to talk about a real incident in which trash was not enabled. There was a large directory deleted, and the admin realized this was a mistake and chose to shut down the NameNode immediately. The support engineer taking the case naturally figured we could restore from trash, so advised restarting the NameNode. “What happens next?”
  17. This incident really points out the importance of protecting data against accidental deletion. HDFS snapshots and HDFS ACLs are two features that I think help with this. I’ll have more coverage of these features later in the presentation.
  18. “What happens next?”
  19. If you’ve used POSIX ACLs on a Linux file system, then you already know how it works in HDFS too.
  20. By convention, snapshots can be referenced as a file system path under sub-directory “.snapshot”.
  21. Here is a screenshot pointing out a change in the HDFS web UI: Total Datanode Volume Failures is a hyperlink. Clicking that jumps to…
  22. …this new screen listing the volume failures in detail. We can see the path of each failed storage location, and an estimate of the capacity that was lost. I think of this screen being used by a system engineer as a to-do list as part of regular cluster maintenance.
  23. Here is what it looks like when there are no volume failures. I included this picture, because this is what we all want it to look like. Of course, it won’t always be that way.