Hadoop is an open-source software framework . Hadoop framework consists on two main layers Distributed file system (HDFS) Execution engine (MapReduce) Supports data-intensive distributed applications. Licensed under the Apache v2 license. It enables applications to work with thousands of computation-independent computers and petabytes of data Hadoop is the popular open source implementation of map/reduce MapReduce is a programming model for processing large data sets MapReduce is typically used to do distributed computing on clusters of computers MapReduce can take advantage of locality of data, processing data on or near the storage assets to decrease transmission of data. The model is inspired by the map and reduce functions "Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to slave nodes. The slave node processes the smaller problem, and passes the answer back to its master node. "Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the final output Highly scalable file system 6k nodes and 120pb Add commodity servers and disks to scale storage and IO bandwidth Supports parallel reading & processing of data Optimized for streaming reads/writes of large files Bandwidth scales linearly with the number of nodes and disks Fault tolerant & easy management Built in redundancy Tolerate disk and node failure Automatically manages addition/removal of nodes One operator per 3k nodes Very Large Distributed File System 10K nodes, 100 million files, 10PB Assumes Commodity Hardware Files are replicated to handle hardware failure Detect failures and recover from them Optimized for Batch Processing Data locations exposed so that computations can move to where data resides Provides very high aggregate bandwidth Hdfs provides a reliable, scalable and manageable solution for working with huge amounts of data Future secure Hdfs has been deployed in clusters of 10 to 4k datanodes Used in production at companies such as yahoo! , FB , Twitter , ebay Many enterprises including financial companies use hadoop.