Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×
Prochain SlideShare
Hadoop fault-tolerance
Hadoop fault-tolerance
Chargement dans…3

Consultez-les par la suite

1 sur 105 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (15)


Similaire à try (20)

Plus récents (20)



  1. 1. Introduction MapReduce
  3. 3. Sample Data Set •  10 billion web pages •  Average size - 20 KB •  Total Size = 10 bil + 20 KB = 200 TB •  Disk read bandwidth = 50 MB/sec •  Time to read = 4 mil sec = 46+days •  Takes more time to do with data
  4. 4. Huge Computations and Large Data Sets •  Hundreds of special-purpose computations •  The computations processes large amounts of raw data –  Crawled documents –  Web request logs
  5. 5. Huge Computations and Large Data Sets •  Compute various of types of derived data –  Inverted indices –  Various representation of graph structure of web documents –  Summaries of no of pages crawled per host –  Set of most frequent queries in a given day
  6. 6. Huge Computations and Large Data Sets ¨ Computations are straight forward but ¤ Input data is very huge ¤ Computations have to be distributed across hundreds of thousands of machines in order to finish in reasonable amount of time
  8. 8. Node failures •  A single server can stay up for 3 years •  1000 servers in a cluster => 1 failure per day •  100K servers => 100 failures per day •  How to store data persistently and keep it available if nodes fail •  How to deal with node failures during long running computation
  9. 9. Network Bottleneck •  Network Bandwidth = 1 GBps •  Moving 10TB takes approximately - 1 day •  Distributed Programming is hard
  10. 10. MOTIVATION
  11. 11. Motivation: Large scale data processing ¨ Framework should be able to ¤ Runs on large cluster of machines ¤ High scalable ¤ Takes Tera bytes of data on thousands of machines ¤ Easy for programmers ¤ Hundreds of map reduce programs have been implemented ¨ Many real world tasks are expressible in this model
  12. 12. Motivation: Large scale data processing ¨ Run-time system takes care of ¤ Handling large amount of input data, ¤ scheduling the program execution across set of machines ¤ Handling machine failures ¤ Manage the required inter-machine communication ¨ Big benefit ¤ Allows the programmers without any experience with parallel and distributed systems to easily utilize the resources of large distributed system
  14. 14. Motivation - Distributed Task Execution •  Problem Statement:There is a large computational problem that can be divided into multiple parts and results from all parts can be combined together to obtain a final result. •  Applications: – Physical and Engineering Simulations, Numerical Analysis, Performance Testing – Large scale indexing
  15. 15. Summary and Aggregation •  There is a number of documents where each document is a set of terms. It is required to calculate a total number of occurrences of each term in all documents.Alternatively, it can be an arbitrary function of the terms. For instance, there is a log file where each record contains a response time and it is required to calculate an average response time. •  Applications: – Log Analysis, Data Querying, ETL, DataValidation
  16. 16. Filtering, Parsing andValidation •  There is a set of records and it is required to collect all records that meet some condition or transform each record (independently from other records) into another representation.The later case includes such tasks as text parsing and value extraction, conversion from one format to another. •  Applications – Log Analysis, Data Querying, ETL, DataValidation
  17. 17. Iterative Message Passing (Graph Processing) •  There is a network of entities and relationships between them. It is required to calculate a state of each entity on the basis of properties of the other entities in its neighborhood.This state can represent a distance to other nodes, indication that there is a neighbor with the certain properties, characteristic of neighborhood density and so on. •  Applications – Social Network Analysis – Supply Chain
  18. 18. Cross-Correlation •  There is a set of tuples of items. For each possible pair of items calculate a number of tuples where these items co-occur. If the total number of items is N then N*N values should be reported. •  Applications: – Text Analysis, Market Analysis
  19. 19. Relational MapReduce Patterns •  Selection •  Projection •  Union •  Intersection •  Difference •  GroupBy and Aggregation •  Joining
  20. 20. SOLUTION
  21. 21. Solution - MapReduce •  Addresses the challenges of cluster computing •  Store data redundantly on multiple nodes for persistence and availability •  Move computation close to the data – to minimize data movement •  Simple Programming Model – to hide the complexity of all this magic
  22. 22. Solution - MapReduce ¨ MapReduce abstraction allows to express simple computations, but hides ¤ Messy details of parallelization ¤ Fault tolerance ¤ Data distribution and ¤ Load balancing in a library
  23. 23. History of MapReduce •  The speed at which MapReduce has been adopted is remarkable. •  It went from an interesting paper from Google in 2004 to a widely adopted industry standard in distributed data processing in 2012. •  The actual origins of MapReduce are arguable, but the paper that most cite as the one that started us down this journey is MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean and Sanjay Ghemawat in 2004. •  This paper described how Google split, processed, and aggregated their data set of mind-boggling size. •  Shortly after the release of the paper, a free and open source software pioneer by the name of Doug Cutting started working on a MapReduce implementation to solve scalability in another project he was working on called Nutch, an effort to build an open source search engine. •  Over time and with some investment byYahoo!, Hadoop split out as its own project and eventually became a top-level Apache Foundation project. •  Today, numerous independent people and organizations contribute to Hadoop. Every new release adds functionality and boosts performance. •  Several other open source projects have been built with Hadoop at their core, and this list is continually growing. •  Some of the more popular ones include Pig, Hive, HBase, Mahout, and ZooKeeper. •  Doug Cutting and other Hadoop experts have mentioned several times that Hadoop is becoming the kernel of a distributed operating system in which distributed applications can be built.
  24. 24. Programming Model •  The programming model addresses the following – The user should not worry about the framework – User should be able to write his business logic – The user should have good productivity – The model should be simple
  25. 25. Distribute the work •  Spread the work on more than one machine •  Challenges – Communication and co-ordination – Recovering from machine failure – Status and progress reporting – Debugging – Optimization – Locality •  The problems are mostly same for every problem
  26. 26. Compute Machines •  CPU’s - typically 2 or 4 –  Typically hyper threaded or dual core •  Multiple local attached hard disks •  4Gigs to 16gigs RAM •  Challenges –  Single thread performance doesn’t matter –  Unreliable machines •  One server – for 1000 days means •  10000 servers, roughly you loose 10 in one day –  Ultra-reliable hardware does not help •  May fail less often •  Software needs to be less tolerant •  Commodity machine gives more perf/per dollar
  27. 27. What is MapReduce •  “A simple and powerful interface that enables automatic parallelization and distribution of large-scale computations, combined with an implementation of this interface that achieves high performance on large clusters of commodity PCs.” –  Dean and Ghemawat,“MapReduce: Simplified Data Processing on Large Clusters”, Google Inc. •  In simple terms –  A distributed/parallel programming model and associated implementation
  29. 29. MapReduce – Programming model •  Process Data using Special Map() and Reduce() functions – The map function is called on every item in the input and emits a series of intermediate key/value pairs – All values associated with a given key are grouped together – The Reduce function is called on every unique key, and its value list, and emits a value that is added to the output
  30. 30. MapReduce •  The messy details are transparent – Automatic parallelization – Load balancing – Network and disk transfer optimization – Handling machine failures – Robustness
  31. 31. MapReduce •  Map(k1,v1) --> list(k2,v2) •  Reduce(k2, list(v2)) --> list(v2) •  Runtime System – Partitions input data – Schedules execution across a set of machines – Handles machine failure – Manages interprocess communication
  32. 32. Benefits •  Greatly Reduces parallel Programming complexity – Reduces synchronization complexity – Automatically partitions data – Provides failure transparency – Handles load balancing
  33. 33. MapReduce - Execution Overview Step1: •  User program via the MapReduce Library, splits the input data User ProgramInput Data Shard 0 Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 * Shards/splits are typically 16-64mb in size
  34. 34. Execution overview – step 2 •  Creates process copies on a distributed machine cluster. •  One copy will be the ‘Master’ and other will be worked threads User Program Master Workers Workers Workers Workers Workers
  35. 35. Execution overview – Step 3 •  Master distributes M map and R Reduce tasks to idle workers – M = no of input splits – R = The intermediate key space is divided into R parts Master Idle Worker Message(Do_map_task)
  36. 36. Execution Overview – Step 4 Each map-task worker reads assigned input shard and outputs intermediate key/value pairs. –  Output buffered in RAM Map workerInput Split 0 Key/value pairs
  37. 37. Execution overview – Step 5 •  Each worker flushes intermediate values, partitioned into R regions, to disk and notifies the Master process. Master Map worker Disk locations Local Storage
  38. 38. Execution overview – Step 6 •  Master process gives disk locations to an available reduce-task worker who reads all associated intermediate data. Master Reduce worker Disk locations remote Storage
  39. 39. Execution overview – Step 7 •  Each reduce-task worker sorts its intermediate data. Calls the reduce function, passing in unique keys and associated key values. •  Sorts and shuffles the data •  Reduce function output appended to reduce-task’s partition output file Reduce worker Sorts data Partition Output file
  40. 40. Execution overview – Step 8 •  Master process wakes up user process when all tasks have completed. Output contained in R output files. wakeup User Program Master Output files
  42. 42. •  0067011990999991950051507004+68750+023550FM-12+03 8299999V0203301N00671220001CN9999999N9+00001+99 999999999 •  0043011990999991950051512004+68750+023550FM-12+03 8299999V0203201N00671220001CN9999999N9+00221+99 999999999 •  0043011990999991950051518004+68750+023550FM-12+03 8299999V0203201N00261220001CN9999999N9-00111+999 99999999 •  0043012650999991949032412004+62300+010750FM-12+04 8599999V0202701N00461220001CN0500001N9+01111+99 999999999 •  0043012650999991949032418004+62300+010750FM-12+04 8599999V0202701N00461220001CN0500001N9+00781+99 999999999 Weather Data
  43. 43. Data Set ¡  0043 ¡  012650 - Weather station identifier ¡  99999 – other identifier ¡  19490324 – observation date ¡  1800 – observation time ¡  4 ¡  +62300 – latitude ¡  +010750 – longitude ¡  .. ¡  .. ¡  Quality code ¡  .. ¡  Air temperature ¡  .. ¡  ,..Atmospheric pressure ¡  etc 0043012650999991949032418004+62300+010750FM-­‐12+048599999V0202701N00461220001C N0500001N9+00781+99999999999  
  44. 44. Data Map  output   Reduce  output   Maximum  global  temperature  recorded  each  year   1949 111 1950 22
  45. 45. Map Reduce Logical Flow
  46. 46. Map function
  47. 47. Reduce function
  48. 48. Map inputs and outputs •  Map Input in HDFS •  Map output in Localdisk •  Reduce reads input from all disks via RPC mechanism •  Reduce output stores in HDFS
  49. 49. Single Reduce task
  50. 50. Multiple Reduce Tasks
  51. 51. No Reduce Task
  52. 52. Combiner Function •  Allows to combine the output of Map •  It is almost similar to Reduce Task
  53. 53. Installing ssh •  Sudo apt-get install ssh •  Rpm –i ssh.rpm •  Password less login – Ssh-keygen –t rsa –P’ ‘ –f ~/.ssh/id_ssa – Cat ~/.ssh/id_rsa_pub >> ~/.ssh/authorized_keys •  Test with ssh localhost
  54. 54. Installing •  Tar xzf hadoop-0.20.0.tar.gz •  Export JAVA_HOME=/jdk1.6path •  Export HADOOP_INSTALL=/home/reddyraja/ hadoop-0.20.0 •  Export PATH=$PATH:$HADOOP_INSTALL •  Check hadoop – hadoop version
  55. 55. Standalone ¡ Everything runs on a single JVM ¡ Suitable for development of MapReduce programs ¡ Easy to test and debug ¡ No daemons to run ¡ Commands §  Compile ▪  Javac –classpath $HADOOP_HOME/hadoop-0.20.0.jar –d bin §  Create the Jar ▪  Jar –cvf maxtemp.jar –C bin / §  Run the example ▪  Hadoop jar maxtemp.jar NewMaxTemperature input output
  56. 56. Pseudo distributed •  Hadoop daemons run on local machine •  Simulates the cluster
  57. 57. Hadoop and others..To avoid confusion •  MapReduce •  GFS •  BigTable •  Chubby ¨  Hadoop   ¨  HDFS   ¨  HBASE   ¨  ZooKeeper  
  58. 58. MapReduce Terms ¨ PayLoad – Applications implement Map and Reduce functions, forms core of the job ¨ Mapper – Maps maps input key/value pairs to a set of intermediate key/value pairs ¨ NamedNode - Node that manages the HDFS file systems ¨ DataNode - Node where data is present. ¨ MasterNode – Node where JobTracker runs ¨ Slave node – Node where Map and Reduce runs
  59. 59. MapReduce Terms continued… ¨ JobTracker –Tracks and assigns jobs toTask tracker. Schedules jobs. ¨ TaskTracker –Tracks the task and reports status to JobTracket ¨ Job - A “full program” - an execution of a Mapper and Reducer across a data set ¨ Task - An execution of a Mapper or a Reducer on a slice of data ¨ Task Attempt – A particular instance of an attempt to execute a task on a machine
  60. 60. Terminology Example •  Running “Word Count” across 20 files is one job •  20 files to be mapped imply 20 map tasks + some number of reduce tasks •  At least 20 map task attempts will be performed… more if a machine crashes, etc. •  Map and Reduce functions inside WordCount
  61. 61. Task Attempts •  A particular task will be attempted at least once, possibly more times if it crashes –  If the same input causes crashes over and over, that input will eventually be abandoned •  Multiple attempts at one task may occur in parallel with speculative execution turned on –  Task ID from TaskInProgress is not a unique identifier; don’t use it that way
  62. 62. MapReduce: High Level JobTracker MapReduce job submitted by client computer Master node TaskTracker Slave node Task instance TaskTracker Slave node Task instance TaskTracker Slave node Task instance
  63. 63. Hadoop Deployment
  64. 64. Hadoop Stack
  65. 65. HDFS Concepts ¡ Distributed FileSytem ¡ Blocks Store ¡ NameNodes and DataNodes ¡ Command Line interface ¡ Basic FileSystem Operations ¡ Java Interfaces
  66. 66. HDFC Architecture
  67. 67. Node-to-Node Communication •  Hadoop uses its own RPC protocol •  All communication begins in slave nodes – Prevents circular-wait deadlock – Slaves periodically poll for “status” message •  Classes must provide explicit serialization
  68. 68. Nodes,Trackers,Tasks •  Master node runs JobTracker instance, which accepts Job requests from clients •  TaskTracker instances run on slave nodes •  TaskTracker forks separate Java process for task instances
  69. 69. Job Distribution •  MapReduce programs are contained in a Java “jar” file + an XML file containing serialized program configuration options •  Running a MapReduce job places these files into the HDFS and notifies TaskTrackers where to retrieve the relevant program code •  … Where’s the data distribution?
  70. 70. Data Distribution •  Implicit in design of MapReduce! – All mappers are equivalent; so map whatever data is local to a particular node in HDFS •  If lots of data does happen to pile up on the same node, nearby nodes will map instead – Data transfer is handled implicitly by HDFS
  71. 71. Configuring With JobConf •  MR Programs have many configurable options •  JobConf objects hold (key, value) components mapping String à ’a –  e.g.,“mapred.map.tasks” à 20 –  JobConf is serialized and distributed before running the job •  Objects implementing JobConfigurable can retrieve elements from a JobConf
  72. 72. Job Launch Process: Client •  Client program creates a JobConf – Identify classes implementing Mapper and Reducer interfaces •  JobConf.setMapperClass(), setReducerClass() – Specify inputs, outputs •  JobConf.setInputPath(), setOutputPath() – Optionally, other options too: •  JobConf.setNumReduceTasks(), JobConf.setOutputFormat()…
  73. 73. Job Launch Process: JobClient •  Pass JobConf to JobClient.runJob() or submitJob() – runJob() blocks, submitJob() does not •  JobClient: – Determines proper division of input into InputSplits – Sends job data to master JobTracker server
  74. 74. Job Launch Process: JobTracker •  JobTracker: – Inserts jar and JobConf (serialized to XML) in shared location – Posts a JobInProgress to its run queue
  75. 75. Job Launch Process: TaskTracker •  TaskTrackers running on slave nodes periodically query JobTracker for work •  Retrieve job-specific jar and config •  Launch task in separate instance of Java – main() is provided by Hadoop
  76. 76. Job Launch Process:Task •  TaskTracker.Child.main(): – Sets up the child TaskInProgress attempt – Reads XML configuration – Connects back to necessary MapReduce components via RPC – Uses TaskRunner to launch user process
  77. 77. Job Launch Process: TaskRunner •  TaskRunner, MapTaskRunner, MapRunner work in a daisy-chain to launch your Mapper – Task knows ahead of time which InputSplits it should be mapping – Calls Mapper once for each record retrieved from the InputSplit •  Running the Reducer is much the same
  78. 78. Creating the Mapper •  You provide the instance of Mapper – Should extend MapReduceBase •  One instance of your Mapper is initialized by the MapTaskRunner for a TaskInProgress – Exists in separate process from all other instances of Mapper – no data sharing!
  79. 79. What is Writable? •  Hadoop defines its own “box” classes for strings (Text), integers (IntWritable), etc. •  All values are instances of Writable •  All keys are instances of WritableComparable
  80. 80. Data to the mapper Input file InputSplit InputSplit InputSplit InputSplit Input file RecordReader RecordReader RecordReader RecordReader Mapper (intermediates) Mapper (intermediates) Mapper (intermediates) Mapper (intermediates) InputFormat
  81. 81. Reading Data •  Data sets are specified by InputFormats – Defines input data (e.g., a directory) – Identifies partitions of the data that form an InputSplit – Factory for RecordReader objects to extract (k, v) records from the input source
  82. 82. InputFormat ¨ Describes the input-specification for a MapReduce Job ¨ MapReduce Relies on InputFormat for ¤ Validate the input-specification of the Job ¤ Split up input files into logical InputSplits and then assigned to Individual Mapper ¤ Provide Record Reader implementation to be used to glean input records from the logical processing by the Mapper ¨ FileInputFormat – splits files based on file size ¤ FileSystem Block size is the upper limit ¤ Lower limit can be set by mapred.min.split.size
  83. 83. FileInputFormat •  TextInputFormat – Treats each ‘n’-terminated line of a file as a value •  KeyValueTextInputFormat – Maps ‘n’- terminated text lines of “k SEP v” •  SequenceFileInputFormat – Binary file of (k, v) pairs with some add’l metadata •  SequenceFileAsTextInputFormat – Same, but maps (k.toString(), v.toString())
  84. 84. Filtering File Inputs •  FileInputFormat will read all files out of a specified directory and send them to the mapper •  Delegates filtering this file list to a method subclasses may override – e.g., Create your own “xyzFileInputFormat” to read *.xyz from directory list
  85. 85. InputSplit ¨ One MapTask for each Split ¨ Framework calls for each record in InputSplit ¤ Setup ¤ Map ¤ Cleanup ¨ Intermediate values ¤ Grouped by framework ¤ Passed to Reducer ¤ Control Sorting using RawComparator class ¤ Use CombinerClass to perform local segregation ¨ Use Partitioner to control which outputs to go to what Reduce ¨ Use CompressionCodecs to compress intermediate output
  86. 86. Input Split Size •  FileInputFormat will divide large files into chunks – Exact size controlled by mapred.min.split.size •  RecordReaders receive file, offset, and length of chunk •  Custom InputFormat implementations may override split size – e.g.,“NeverChunkFile”
  87. 87. Record Readers •  Each InputFormat provides its own RecordReader implementation – Provides (unused?) capability multiplexing •  LineRecordReader – Reads a line from a text file •  KeyValueRecordReader – Used by KeyValueTextInputFormat
  88. 88. Partitioner •  Controls the keys of intermediate maps •  Key is used to derive the partition, which is a hash function •  Total partitions is same as number of reduce tasks
  89. 89. Reduce ¨ Reduces a set of intermediate values which share a key to a smaller set of values ¨ Has 3 phases ¤ Shuffle n Copies the sorted output from each mapper using HTTP across the network ¤ Sort n Sorts reduce inputs by keys n Shuffle and sort phases occur simultaneously n Secondary Sort using custom functions ¤ Reduce n Framework class Reduce for each key and collection of values
  90. 90. Sending Data To Reducers •  Map function receives OutputCollector object – OutputCollector.collect() takes (k, v) elements •  Any (WritableComparable,Writable) can be used
  91. 91. WritableComparator •  Compares WritableComparable data – Will call WritableComparable.compare() – Can provide fast path for serialized data •  JobConf.setOutputValueGroupingComparator()
  92. 92. Sending Data To The Client •  Reporter object sent to Mapper allows simple asynchronous feedback – incrCounter(Enum key, long amount) – setStatus(String msg) •  Allows self-identification of input – InputSplit getInputSplit()
  93. 93. Partition and shuffle Mapper (intermediates) Mapper (intermediates) Mapper (intermediates) Mapper (intermediates) Reducer Reducer Reducer (intermediates) (intermediates) (intermediates) Partitioner Partitioner Partitioner Partitioner shuffling
  94. 94. Partitioner •  int getPartition(key, val, numPartitions) – Outputs the partition number for a given key – One partition == values sent to one Reduce task •  HashPartitioner used by default – Uses key.hashCode() to return partition num •  JobConf sets Partitioner implementation
  95. 95. Reduction •  reduce( WritableComparable key, Iterator values, OutputCollector output, Reporter reporter) •  Keys & values sent to one partition all go to the same reduce task •  Calls are sorted by key – “earlier” keys are reduced and output before “later” keys
  96. 96. Finally:Writing The Output Reducer Reducer Reducer RecordWriter RecordWriter RecordWriter output file output file output file OutputFormat
  97. 97. OutputFormat •  Analogous to InputFormat •  TextOutputFormat – Writes “key valn” strings to output file •  SequenceFileOutputFormat – Uses a binary format to pack (k, v) pairs •  NullOutputFormat – Discards output
  98. 98. Installing Hadoop •  http://juliensimon.blogspot.in/2011/01/installing- hadoop-on-windows-cygwin.html •  http://blog.benhall.me.uk/2011/01/installing- hadoop-0210-on-windows_18.html