SlideShare a Scribd company logo
1 of 56
MapReduce: Model, Internals, and
      Its Improvements

                     June 29, 2012

                Kyong-Ha Lee
         bart7449@gmail.com



          Copyright © KAIST Database Lab. All Rights Reserved.
Outline
Three topics that I will discuss today :
♦   MapReduce programming model
♦   Anatomy of the MapReduce framework
    –   Basic principles about the MapReduce framework
    –   Hadoop internal architecture
    –   Not much discussion on implementation details, but will be
        happy to discuss them if there are any questions.
♦   A brief survey on the study of improving the
    conventional MapReduce framework


                      Copyright © KAIST Database Lab. All Rights Reserved.
                                                                             2
Big Data
♦   “A loosely-defined term used to describe data sets so large
    and complex that they become awkward to work with using
    an on-hand database in a single node.” - Wikipedia
♦   Data growth challenges are defined by:*
    –     Increasing volume (amount of data)
          »     e.g. UniProtKB, a protein knowledgebase now hits more than
                108GB a file.
    –     Velocity (speed of data in/out)
          »     6TB of new log data is collected at Facebook everyday
                *source: A comparison of join algorithms for log processing in MapReduce, SIGMOD’10

    –     Variety (range of data types, sources)
          »      Multimedia formats, unstructured and semi-structured data.

        * Doug Laney, “3D Data Management: Controlling Data Volume, Velocity and Variety”, 2001

                                 Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                                      3
Commodity Clusters
♦   It is hard to store and process big data on a single
    machine in a timely manner
♦   Standard architecture emerging:
    –   Cluster of commodity Linux nodes
    –   Gigabit Ethernet interconnects
    –   Low-price commodity PCs
♦   How to organize computations on this architecture?
    –   Practical issues such as SW/HW failures




                      Copyright © KAIST Database Lab. All Rights Reserved.
                                                                             4
Fault Tolerance
    ♦      Cheap nodes fail frequently, if you have many
           –     An average of 1.7 percent of drives fail within a year and
                 more than 8.6 percent of drives fail after three years*.
                 »     MTBF for 1 node = 3 years
                 »     MTBF for 1000 nodes = 1 day in average
           –     An average of 1.29 percent of nodes suffer from
                 uncorrectable memory errors within a year*.
    ♦      Putting fault-tolerance into system is necessary
    ♦      Job posting from data center team at Google *
           –     One of requirements:
                 »     “ able to lift/move 20-30 lbs equipment on a daily basis”
* Failure trends in a large disk population-FAST’07
* DRAM Errors in the Wild: A Large-Scale Field Study, SIGMETRICS, 2009
* http://www.xing.com/net/datacenter/possible-positions-130745/google-datacenter-roles-berlin-frankfurt-munich-9051768
                                         Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                                                         5
A Simple Cluster Architecture
                                    8 Gbps backbone between racks
 1 Gbps between                        Switch
 any pair of nodes
 in a rack
              Switch                                                          Switch


      CPU                       CPU                            CPU                     CPU

      Mem      …               Mem                             Mem             …       Mem

      Disk                      Disk                           Disk                    Disk

  Information about Yahoo! cluster used for TeraSort:
  •   Approximately 3,800 nodes; each rack contains 40 nodes
  •   2 quad core Xeons @ 2.5GHz per node, 8GB RAM, 4 SATA HDDs
  •   Redhat Enterprise Linux server R5.1 (Kernel 2.6.18)
  •   Sun JDK 1.6.0 05 and 13
                       Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                              6
The Need of Stable Storages
♦   Distributed File System
    –   Provides global file namespace
    –   Typical usage patterns
    –   Huge files (100s of GB to TB)
        »   Data is rarely updated in place
        »   Reads and appends are common I/O patterns
♦   Google GFS; Hadoop HDFS
    –   Master manages metadata
    –   Data transfers happen directly between clients/chunk servers
    –   Files broken into blocks(typically 64 MB)
    –   Data replication (typically 3 replicas, asynchronous)
    –   Immutable data blocks
                        Copyright © KAIST Database Lab. All Rights Reserved.
                                                                               7
GFS Design




         Copyright © KAIST Database Lab. All Rights Reserved.
                                                                8
Principles of Parallel Processing
♦   Large-scale data processing
    –   Googlers want to use 1,000s of CPUs
    –   But don’t want hassle of managing things

♦   They also want their system to provide:
    –   Automatic parallelization & distribution
    –   Fault-tolerance during processing
    –   Efficient I/O scheduling
    –   Monitoring & status updates




                       Copyright © KAIST Database Lab. All Rights Reserved.
                                                                              9
MapReduce
♦   A useful solution for big data processing
♦   Both a programming model and a framework for
    massive parallel processing of large datasets with many
    commodity machines
    –   Popularized and controversially patented by Google Inc.
    –   Analogous to Group-By-Aggregation in databases
♦   Easy to distribute a job across nodes
    –   Support of data parallelism
♦   No hassle of managing jobs across nodes
    –   By hiding details of parallel execution and allow users to focus
        only on data processing strategies
♦   Nice retry/failure semantics
♦   Runtime scheduling with speculative execution
                       Copyright © KAIST Database Lab. All Rights Reserved.
                                                                              10
Its Importance and Impact
♦   “Data center is the computer. If MapReduce is the first
    instruction of the data center computer, I can’t wait to
    see the rest of the instruction set, as well as the data
    center programming language, the data center operating
    system, the data center storage systems, and more.”
     - David A. Patterson. Technical perspective: the data center is the
    computer. CACM, 51(1):105, 2008.
♦   A list of institutions that are using Hadoop, an open-source
    Java implementation of MapReduce
♦   Its scholastic impact!



as of Dec 31, 2011                        as of June 28, 2012
                         Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                11
Usage Statistics at Google
                                  Aug ‘04                  Mar ‘06                   Sep ‘07       Sep ‘09
The number of jobs                       29K                     171K                    2,217K     3,467K
Average completion                       634                       874                    395         475
time (secs)
Machine years used                       217                     2,002                   11,081      25,562
Input data read(TB)                    3,288                    52,254                   403,152    544,130
Intermediate data(TB)                    758                     6,743                   3,4774      90,120
Output data                              193                     2,970                   14,018      57,520
written(TB)
  Average worker                     157                268                 394                488
* machines Design, Lessons, Advices from Building Large Distributed System, Keynote , LADIS 2009.
  source: J. Dean,




                                  Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                                              12
MapReduce Programming Model
♦   Input: a set of key/value pairs
♦   A user implements two functions:
    –   map(key1, value1)  (key2, value2)
    –   reduce(key2, a list of value2)  (key3, value3)
♦   (key2, value2) is an intermediate key/value pair
♦   Output is the set of (k3,v3) pairs
♦   Many problems can be phrased in this way
    –   But not for all!




                           Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                  13
Example : Building Inverted Index
map(key, value):
// key1: document Id; value1: text of document
     for each word w in value1:
          emit(w, docId )
reduce(key, values):
// key2: a word w ; value2 : docId
          initialize a list L
          for each docId d in values:
                    put d into L
          emit(w, L)

 output(word, a sorted list of docIds)


                      Copyright © KAIST Database Lab. All Rights Reserved.
                                                                             14
Map stage in a M/R Job
Input              Intermediate
key-value pairs    key-value pairs

                       k              v
             map
k       v
                       k              v
             map
k       v
                       k              v

    …                    …
             map
k       v               k               v




                   Copyright © KAIST Database Lab. All Rights Reserved.
                                                                          15
Reduce Stage in a M/R Job

                                                                                        Output
Intermediate                 Key-value groups                                           key-value pairs
key-value pairs
                                                                               reduce
   k       v                  k              v          v          v                         k        v
                                                                               reduce
   k       v                  k             v          v                                     k        v


   k       v      grouping
                                       …                                                          …
       …

   k        v                   k               v                              reduce         k       v



                        Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                                          16
Data in a M/R Job
♦   Input and final output are stored on DFS
    –   Scheduler tries to schedule map tasks “close” to physical
        storage location of input data
    –   Fault tolerance guaranteed by replicas.
♦   Intermediate results are stored on local disks of map and
    reduce workers for fault tolerance.
    –   Until the next tasks are fully completed.
♦   Outputs of a M/R job often become inputs of another MR
    job
    –   Consecutive or iterative M/R jobs for a single work
        »   e.g. binary joins and PageRank



                        Copyright © KAIST Database Lab. All Rights Reserved.
                                                                               17
Parallel Execution across Nodes
1.   Partition input key/value pairs into blocks and then
     run map() tasks in parallel
2.   After all map()s are complete, consolidate all emitted
     values for each unique emitted key
3.   Now partition space of output map keys, and run
     reduce() in parallel
4.   In reduce(), values for each key are grouped together
     then aggregated, reduced output are stored on DFS



                    Copyright © KAIST Database Lab. All Rights Reserved.
                                                                           18
Combiner
♦   Often a map task will produce many pairs of the form
    (k,v1), (k,v2), … for the same key k
    –   e.g., popular words in Word Count

♦   It can save many I/Os by pre-aggregating in mappers
    –   combine(k1, list(v1))  v2
    –   Usually same as reduce function; in-mapper aggregation

♦   Combiners works correctly only if reduce function is
    commutative and associative




                      Copyright © KAIST Database Lab. All Rights Reserved.
                                                                             19
System Architecture
                                       Input
                      Block 1           Block 2          Block 3         ...     Block n

                 Map
               Local sort
                Mapper                Mapper                       Mapper
               Combiner



                                                                               Intermediate
                                                                                   result
     Barrier
                                                                                pull

           Shuffle/Copy
                Reduce
                Merge                                             Reducer
                Reduce


                                     Output
                      Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                              20
Sort-Merge in MapReduce
    Input
k       v                                                                                                           Reduce()
                         k       v           v                         k             v          v
k       v                                                                                                       k   v   v        v
            Local sort                                                                              Merge
                         k       v                                     k           v
k       v

                                                                      Shuffle


k       v
                             k   v          v                            k              v
k       v
             Local sort
                             k       v                                  k           v           v           k       v   v        v
k       v                                                                                           Merge

                                         Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                                                            21
Data Partitioning
♦   Inputs to map tasks are created by contiguous splits of input file
♦   For reduce, we need to ensure that records with the same
    intermediate key end up at the same worker
♦   System uses a default partition function e.g., hash(key) mod R
♦   Sometimes useful to override
    –   e.g., hash(hostname(URL)) mod R ensures URLs from a host end up
        in the same output file




                        Copyright © KAIST Database Lab. All Rights Reserved.

                                                                               22
Fault Tolerance
♦   If tasks fail, the tasks are executed again in another node
    –   Detect failure via periodic heartbeats
    –   Re-execute in-progress map tasks
    –   Re-execute in-progress reduce tasks
♦   If a node crashes:
    –   Re-launch its current tasks on other nodes
    –   Re-run any maps the node previously ran
        » Necessary because their output files were lost along with the
           crashed node
♦   If a task is going slowly (straggler):
    –   Launch second copy of task on another node (“speculative
        execution”)
    –   Take the output of whichever copy finishes first, and kill the
        other

                         Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                23
♦   Hadoop Distributed File System
    –   An open-source clone of GFS
    –   Write-one and read-many
    –   Data partitioned into 64MB or 128MB blocks, each block
        replicated 3 times.
    –   NameNode holds filesystem metadata
    –   Files are broken up and spread over the DataNode
♦   Hadoop MapReduce
    –   Java implementation of MapReduce
    –   JobTracker schedules and manages jobs
    –   TaskTracker executes individual map() and reduce() tasks on
        each node.


                      Copyright © KAIST Database Lab. All Rights Reserved.
                                                                             24
A Brief History of Apache Hadoop
Branches & Releases




*source: http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/

                          Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                 25
Hadoop 2 alpha :
HDFS Federation
MapReduce V2 aka. YARN
                     Copyright © KAIST Database Lab. All Rights Reserved.
                                                                            26
Hadoop Ecosystem




         Copyright © KAIST Database Lab. All Rights Reserved.
                                                                27
System Behavior on a Single Node




                                                                 *Source: A comparison of
                                                                 join Algorithms for log
                                                                 processing in MR,
                                                                 SIGMOD’10
          Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                     28
Overall Execution Time




*Source: A patform for scalable one-pass analytics using MapReduce, SIGMOD’11
                            Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                   29
Criticism
♦   D. DeWitt and M. Stonebraker badly criticized that
    “MapReduce is a major step backwards”[5].
    –   He first regarded it as a simple Extract-Transform-Load tool.
♦   A technical comparison was done by Pavlo et al.[6]
    –   Compared with a commercial row-wise DBMS and Vertica
    –   After that, technical debates btw. researchers vs.
        practitioners are triggered
♦   CACM welcomed this technical debate, inviting both
    sides in The Communications of ACM, Jan 2010 [7,8]


                      Copyright © KAIST Database Lab. All Rights Reserved.
                                                                             30
The Technical Debate




          Copyright © KAIST Database Lab. All Rights Reserved.
                                                                 31
*Source: A Comparison of
    Approaches to Large-Scale
    Data Analysis, SIGMOD’09




Copyright © KAIST Database Lab. All Rights Reserved.
                                                       32
Advantages
♦   Simple and easy to use
    –   Users code only Map() and Reduce()
    –   BashReduce implements MapReduce for standard Unix commands such as
        sort, awk, grep, join etc.
    –   Users need not to consider how to distribute their job
♦   Flexible
    –   No data model, no schema
    –   Users can treat any irregular or unstructured data with MapReduce
♦   Independent of the underlying storage
♦   Fault tolerance
    –   Users need not to worry about faults during running
    –   Reported that MapReduce can continue to work in spite of an avg. of 1.2
        failures per job at Google.
♦   High scalability
    –   Yahoo! reported that Hadoop gear could scale out more than 4,000 nodes in
        2008.
                         Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                  33
Pitfalls
♦   No high-level language
    –   M/R itself does not support any high-level language li and also query optimization
♦   Schema-free and index-free
    –   Impromptu data processing throws away the benefits of data modeling
    –   Requires data parsing and full scan
♦   A single fixed dataflow
    –   Ease of use with a simple abstraction
    –   Complex algorithms are hard to be phrased in a single M/R jobs.
♦   Low efficiency
    –   Sacrifice of disk I/O for fault-tolerance
        »    Result materialization on local disks in each step -> no pipelining
        »    Three replicas on DFS
    –   Block level restarts; a simple heuristic runtime scheduling with speculative execution
♦   Blocking operators
    –   Caused by merge-sort for grouping values
    –   Reduce()begins after all map tasks end
♦   Very young compared to over 40 years of database
    –   Not mature yet and third-party tools are still relatively few
                              Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                                 34
A Short List of Improvements
♦   Sacrifice of disk I/O for fault-tolerance ♦        A simple heuristic scheduling /load
    –   Main difference against DBMS                   balancing
    –   Compression techniques like LZO and            –   LATE, Leen, SkewTune, …
        snappy, …
                                                  ♦    Relatively poor performance
    –   Column-wise: RCFile, CIF, CoHadoop,
                                                       –   Adaptive and automatic performance tuning.
        DREMEL, . . .
                                                       –   Work sharing/Multiple jobs
♦   A single fixed dataflow                                »    MRShare, ReStore
    –   Dryad, SCOPE, Nephele/PACT, ..                     »    Hive, Pig Latin
    –   Map-Reduce-Merge for binary operators,             »    fair/capacity sharing, ParaTimer
        Map-Join-Reduce and some join techniques
                                                 ♦     Cowork with other tools
    –   Twister & HaLoop for iterative workload
                                                       –   SQL/MapReduce, HadoopDB, Teradata
    –   Map-Join-Reduce & Join algorithms in               EDW’s Hadoop integration, Oracle in-
        MapReduce                                          database Hadoop, …
♦   No schema                                     ♦    DW based on MR
    –   Protocol buffer, JSON, Avro, ….                –   Cheetah, Osprey, Tenzing, RICARDO,…
♦   No indexing                                   ♦    Optimization
    –   HadoopDB, Hadoop++, Trojan, …                  –   Self-tuning, multi query processing, query
♦   No high-level language                                 optimization
    –   Hive, Sawzall, SCOPE, Pig Latin, … , Jaql, ♦   Other complements
        Dryad/LINQ
♦   Blocking operators
    –   MapReduce Online, Mortar, MR-Hash, …                                                       35
Data Skewness in M/R
♦   When skew arises, some partitions
    of an operation take longer to
    process, slowing down the entire
    computation.
♦   Type
    –   (1) skew caused by an uneven
        distribution of input data to operator
        partitions (or tasks) and
    –   (2) skew caused by some portions of
        the input data taking longer to process         *A timing chart of a M/R
                                                             job running the
        than others.                                      PageRank algorithm
♦   Map-skew/reduce-skew                                      from Cloud 9
                                             *A Study of Skew in MapReduce Applications,
                       Copyright © KAIST Database Lab. All Rights Reserved. OpenCirrus’11
                                                                                 36
Simple Solution for reduce-skew
♦     Assigning ReduceIds to mapped outputs as keys
      according to statistics acquired by sampling or
      scanning input data
      –   For skewed data, assign more ReduceIds for repartition
♦     Pros and cons
      –   Simple, no modification of M/R internals
      –   But, impose additional overhead, i.e. the 1st M/R job
                                      Simple
Map                Reduce             delivery                        Map            Reduce
          Mapped                                         (assign reduceId)
          output                                                               Shuffle
                                    Statistics                                 by ReduceId

                        Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                             37
SkewTune
♦   Key challenges
    –   Requires no extra input from the user yet work for all M/R
        implementation
    –   Impose minimal overhead if there is no skew.
♦   Detecting and mitigating skew at runtime
    –   When a node becomes idle, it identifies the task with the
        greatest expected remaining processing time.
    –   Unprocessed input data of the straggling task is then proactively
        repartitioned.




                       Copyright © KAIST Database Lab. All Rights Reserved.
                                                                              38
Schema Support
♦   MapReduce parses each data record at reading time
    –   Causing performance degradation
♦   Serialization formats are rather used.
    –   Google’s protocol buffers, JSON, Apache’s thrift and avro.
♦   Compression techniques are sometimes used
    together to reduce I/O costs.
    –   LZO, snappy, …
    –   Fast compression/decompression is sometimes important
        than compression ratio.
    –   Data processing even with compressed data, e.g. run-length
        encoding.

                      Copyright © KAIST Database Lab. All Rights Reserved.
                                                                             39
HadoopDB
♦   A hybrid system that connects
    multiple single site DBMS with
    M/R
♦   Combining M/R-style
    scalability and the
    performance of DBMS
♦   It utilizes M/R as a distributing
    system
    –   Queries are written in SQL and
        distributed via M/R across
        nodes.
    –   Data processing is boosted by
        the features of DBMS engines
        as much as possible.

                        Copyright © KAIST Database Lab. All Rights Reserved.
                                                                               40
Twister and HaLoop
♦




          Copyright © KAIST Database Lab. All Rights Reserved.
                                                                 41
Dryad
♦   Allows to design and execute
    a dataflow graph as users’
    wish
♦   A form of DAG
    –   Vertex: a program that can
        run when all inputs are ready
    –   Channel: either of file, TCP
        pipe, or shared memory, ..
♦   A logical dataflow graph is
    mapped onto physical
    resources by a job scheduler
    at runtime.


                       Copyright © KAIST Database Lab. All Rights Reserved.
                                                                              42
MapReduce Online
  ♦    To address an issue that pull-based communication and
       checkpoints of mapped outputs limit pipelined processing.
  ♦    Support online aggregation and continuous queries in
       MapReduce
  ♦    Mappers push their data temporarily stored into reducers
       periodically in the same M/R job.
  ♦    Enables approximate answers and stream processing
       –     Can also reduce the response times of jobs
Read
Input File
                        map                                                 reduce
             Block 1

  HDFS                                                                                    HDFS
             Block 2
                        map                                                 reduce
                                                                                     Write Snapshot
                           Copyright © KAIST Database Lab. All Rights Reserved.      Answer
                                                                                                  43
Hash-based Partitioning
♦   Merge sort affects both the performance and the nature of M/R
♦   As soon as each map task outputs its intermediate results, the
    results are hashed and pushed to hash tables held by reducers.
♦   Reducers perform aggregation on the value in each bucket.
♦   No grouping is required
♦   On-the-fly aggregation even when all mappers are not completed
    yet.




                      Copyright © KAIST Database Lab. All Rights Reserved.
                                                                             44
Column-wise Storages
   ♦     Column-oriented storage formats
         –   read-optimized by avoiding unnecessary column reads during table scans
         –   It outperforms the other structures in most cases.
         –   RCFile uses column-wise compression and thus provides efficient storage
             space utilization.
   ♦     Systems
         –   Google’s BigTable, Dremel
         –   Facebook’s Record Columnar File(RCFIle), CIF, Llama, . . .




                                                       CIF
                              Copyright © KAIST Database Lab. All Rights Reserved.
RCFile                                                                                 45
Block Collocation




          Copyright © KAIST Database Lab. All Rights Reserved.
                                                                 46
Sharing Multiple Query Executions
in M/R
♦   Inspired by multi query optimization techniques in database
♦   three sharing opportunities across multiple MR jobs:
    –   scan sharing, mapped outputs sharing, and Map function sharing
♦   Intermediate result sharing improves the execution time significantly.
♦   Sharing all scans may yield poor performance because of the cost of
    merge-sort.
    –   No scan sharing, |D| * n for n M/R jobs with |D|-size input
    –   With scan sharing, n*|D|* log(n*|D|), compared to |D|log|D| * n




                          Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                 47
Reusing Results of M/R jobs
♦   The current practice is to delete intermediate results from DFS
♦   more useful if these intermediate results can be stored and
    reused in future M/R jobs.
♦   ReStore matches input workflows of M/R jobs with previously
    executed jobs and rewrites the workflows to reuse the stored
    results of the matched jobs.




                      Copyright © KAIST Database Lab. All Rights Reserved.
                                                                             48
Research Challenges and Issues
♦   Parallelizing conventional algorithms
    –   Finding algorithms easy to be phrased in M/R jobs.
    –   Or, develop algorithms for that support.
    –   Not good for ad-hoc queries
♦   Performance Improvements
    –   Not so well utilize the modern HW features
        »   Multi-core, GPGPU, SSD, etc
    –   Some caveats still exist in the model
        »   e.g. iterative and incremental processing
    –   Self-tuning
        »   150+ tuning knobs in Hadoop
        »   Long-running analysis and batch processing
    –   Real-time MapReduce

                         Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                49
Summary
♦   M/R is simple, but provides good scalability and fault-
    tolerance for massive data processing
♦   M/R does not substitute DBMS
♦   M/R complements DBMS with scalable and flexible
    parallel processing for various data analysis
♦   I/O efficiency of MapReduce still needs to be
    addressed for more successful implications
    –   sort-merge based grouping and frequent checkpoints
♦   Many application domains and room for improvements

                     Copyright © KAIST Database Lab. All Rights Reserved.
                                                                            50
Thank you!
Questions or comments?




      Copyright © KAIST Database Lab. All Rights Reserved.
                                                             51
References
1.     David A. Patterson. Technical perspective: the data center is the computer. Communications of ACM, 51(1):105,
       2008.
2.     Hadoop. users List; http://wiki.apache.org/hadoop/PoweredBy
3.     Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data, Processing on Large Clusters, In Proceedings of
       OSDI 2004 and CACM Vol. 51, No. 1 pp. 107-113, 2008
4.     S. Ghemawat and et al. The Google File System, ACM SIGOPS Operating Systems Review, Vol. 37, No. 5 pp. 29-
       43, 2003
5.     David J. DeWitt and Michael Stonebraker, MapReduce: a major step backwards, Database column blog, 2008
6.     Andrew Pavlo and et al. A Comparison of Approaches to Large-Scale Data Analysis, In Proceedings of SIGMOD
       2009
7.     Michael Stonebraker and et al. MapReduce and Parallel DBMSs: Friends or Foes?, Communications of ACM, Vol
       53, No. 1 pp. 64-71, Jan 2010
8.     Jeffrey Dean and Sanjay Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM,
       Vol. 53, No. 1 pp. 72-72 Jan 2010
9.     M. Stonebraker, The case for shared-nothing. Data Engineering Bulletine, 9(1):4-9, 1986
10.    D. DeWitt and J. Gray, Parallel database systems: the future of high performance database systems,
       Communications of the ACM 35(6):85-98, 1992
11.    B. Schroeder and et a. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you. In
       Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST), pages 1–16, 2007.
12.    B. Schroeder and et al. DRAM errors in the wild: a large-scale field study. In Proceedings of the eleventh
       international joint conference on Measurement and modeling of computer systems, pages 193–204. ACM New York,
       NY, USA, 2009



                                      Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                                               52
13.   G.M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In
      Proceedings of the April 18-20, 1967, spring joint computer conference, pages 483–485. ACM, 1967.
14.   J.L. Gustafson. Reevaluating Amdahl’s law. Communications of the ACM, 31(5):532–533, 1988.
15.   A.H. Karp and H.P. Flatt. Measuring parallel processor performance. Communications of the ACM,
      33(5):539–543, 1990.
16.   Apache Foundation, MapReduce V0.21.0 Tutorial,
      http://hadoop.apache.org/mapreduce/docs/r0.21.0/mapred_tutorial.html, 2010
17.   Incremental MapReduce, TV’s cobweb blog, http://eagain.net/articles/incremental-mapreduce/
18.   Y. Bu and et al. HaLoop: Efficient Iterative Data Processing on Large Clusters, In Proceedings of VLDB’10
19.   J. Ekanayake and et al. Twister: A Runtime for Iterative MapReduce, In Proceedings of ACM HPDC’10 pp.
      810-818, 2010
20.   M. Isard and et al. Dryad: Distributed data-parallel programs from sequential building blocks. In
      Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, page
      72. ACM, 2007.
21.   R. Chaiken and et al. Scope: easy and efficient parallel processing of massive data sets. PVLDB:
      Proceedings of Very Large Data Base Endowment, 1(2):1265–1276, 2008.
22.   C. Olston and et al. Pig Latin: a not-so-foreign language for data processing. In SIGMOD ’08: Proceedings
      of ACM SIGMOD Conference, pages 1099–1110, 2008.
23.   A. Gates and et al. Building a high level dataflow system on top of MapReduce: The pig experience.
      PVLDB: In Proceedings of VLDB, 2(2):1414–1425, 2009.
24.   R. Pike and et al. Interpreting the Data: Parallel Analysis with Sawzall, Scientific Programming, Vol. 13 No.
      4, pp. 277-298, 2005
25.   A. Thusoo and et al. Hive- A Warehousing Solution over a Map-Reduce Framework. PVLDB: Proceedings
      of Very Large Data Base Endowment, 2009
26.   A. Thusoo and et al. Hive - a petabyte scale data warehouse using hadoop. In Proceedings of ICDE 2010



                                    Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                                             53
27.   Y. Yu and et al. DryadLINQ: A system for general-purpose distributed data-parallel computing using a
      high-level language. In OSDI ’08: Proceedings of Symposium on Operating System Design and
      Implementation, 2008
28.   M. Isard and et al. Distributed Data-Parallel Computing Using a High-Level Programming Language, In
      Proceedings of SIGMOD 2009
29.   D. Logothetis and et al. Ad-Hoc Data Processing in the Cloud, In Proceedings of VLDB’08
30.   T. Condie and et al. MapReduce Online, In Proceedings of USENIX NSDI, 2010
31.   A. Alexandrov and et al. Massively Parallel Data Analysis with PACTs on Nephele, In Proceedings of
      VLDB Vol. 3 No.2, 2010
32.   Battr{'e}, D and et al. Nephele/PACTs: a programming model and execution framework for web-scale
      analytical processing, In Proceedings of SoCC 2010
33.   Eric Friedman and et al. SQL/MapReduce: A practical approach to self-describing, polymorphic, and
      parallelizable user defined functions. PVLDB: PVLDB: Proceedings of Very Large Data Base Endowment,
      2(2):1402–1413, 2009.
34.   A. Abouzeid and et al. HadoopDB: An architectural hybrid of mapreduce and dbms technologies for
      analytical workloads. VLDB’09: Proceedings of Very Large Data Base Endowment, pages 1084–1095,
      2009.
35.   Y. Xu and et al. Integrating Hadoop and Parallel DBMS, In Proceedings of ACM SIGMOD, pp. 969-974,
      2010
36.   S. Das and et al. Ricardo: Integrating R and Hadoop, In Proceedings of ACM SIGMOD pp. 987-998, 2010
37.   J. Dittrich and et al. Hadoop++ Making a Yellow Elephant Run like a Cheetah (Without it Even Noticing), In
      Proceedings of VLDB’10
38.   S. Chen, Cheetah: A High Performance Custom Data Warehouse on top of MapReduce, In Proceedings
      of VLDB, Vol. 3, No. 2, 2010
39.   S. Melnik and et al. Dremel: Interactive Analysis of Web-Scale Datasets, In Proceedings of VLDB VOl 3.
      No .1, 2010


                                    Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                                           54
40.   C. Yang and et al. Osprey-Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing
      Distributed Databasem, In Proceedings of IEEE ICDE pp. 657-668, 2010
41.   M. Zaharia and et al. Improving MapReduce Performance in Heterogeneous Environments, In
      Proceedings of USENIX OSDI’08
42.   H. Yang, and et al., Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters, In
      Proceedings of SIGMOD’07
43.   D. Jiang and et al. Map-Join-Reduce: Towards Scalable and Efficient Data Analysis on Large Clusters,
      IEEE Transactions on Knowledge and Data Engineering, preprint
44.   S. Blanas and et al. A Comparison of Join Algorithms for Log Processing in MapReduce, In Proceedings
      of SIGMOD’10
45.   F. N. Afrati and et al. Optimizing Joins in a Map-Reduce Environment, in Proceedings of EDBT 2010
46.   R. Vernica and et al. Efficient Parallel Set-Similarity Joins Using MapReduce, In Proceedings of
      SIGMOD’10
47.   T. Nykiel and et al. MRShare: Sharing Across Multiple Queries in MapReduce, In Proceedings of VLDB’10
48.   K. Morton and et al. Estimating the progress of MapReduce Pipelines, In Proceedings of IEEE ICDE pp.
      681-684, 2010
49.   K. Morton and et al. ParaTimer: A Progress Indicator for MapReduce DAGs, In Proceedings of ACM
      SIGMOD, pp. 507-518, 2010
50.   S. Papadimitriou and et al. DisCo: Distributed Co-clustering with Map-Reduce, In Proceedings of IEEE
      ICDM pp. 512-521, 2009
51.   C. Wang and et al. MapDupReducer : detecting near duplicates over massive datasets, In Proceedings of
      ACM SIGMOD pp. 1119-1122, 2010
52.   S. Babu, Towards Automatic Optimization of MapReduce Programs, In Proceedings of ACM SoCC’10
53.   D. Jiang and et al. The Performance of MapReduce: An In-depth Study, In Proceedings of VLDB’10
54.   E. Jahani and et al. Automatic Optimization for MapReduce Programs, Proceedings of VLDB Vol.4, No. 6 ,
      2011


                                   Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                                       55
55.   B. Catanzaro and et al. A Map Reduce Framework for Programming Graphic Processors, In Proceedings
      of Workshop on Software Tools for Multicore Systems, 2008
56.   B. He and et al. Mars: A MapReduce framework on graphic processors, In Proceedings of PACT’10 pp.
      260-269, 2008

57.   W. Jiang and et al. A Map-Reduce System with an Alternate API for Multi-Core Environments, In
      Proceedings of 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, 2010
58.   Jeff Dean, Design, Lessons, Advices from Building Large Distributed System, Keynote , LADIS 2009.
59.   Willis Lang and et al. Energy Management for MapReduce Clusters, Proceedings of VLDB Vol. 3 No. 1,
      2010
60.   W. Xiong and et al. Energy Efficient Data Intensive Distributed Computing, Data Engineering Bulletin Vol.
      34, No. 1, pp. 24-33, March 2011
61.   E. Anderson and et al. Efficiency Matters!, ACM SIGOPS Operating Systems Review, 44(1):40-45, 2010
62.   Jimmy Lin and Chris Dyer, Data-Intensive Text Processing, Book
63.   G. Malewicz and et al. Pregel: A System for Large-Scale Graph Processing, In Proceedings of PODC’09
64.   J. Ekanayake and et al. MapReduce for Data Intensive Scientific Analyses, In Proceedings of IEEE
      eScience’08
65.   K. B. Hall and et al. MapReduce/BigTable for Distributed Optimization , NIPS LCCC Workshop 2010
66.   PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce
67.   MC Schatz, CloudBurst: Highly sensitive read mapping with MapReduce, Bioinformatics, Vol 25, No. 11
68.   B. Fan and et al. DiskReduce: RAID for data-intensive scalable computing, In Proceedings of the 4th
      Annual workshop on Petascale Data Storage, pp. 6-10, 2009
69.   K. Lee and et al. Parallel data processing with MapReduce: a survey, The SIGMOD Record, Vol 40, No. 4,
      pp.11-20, 2011




                                    Copyright © KAIST Database Lab. All Rights Reserved.
                                                                                                          56

More Related Content

What's hot

A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkeldariof
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingMohammad Mustaqeem
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)PyData
 
Introduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopIntroduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopMohamed Elsaka
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013WANdisco Plc
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1Stefanie Zhao
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...Bikash Chandra Karmokar
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endthkoch
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleDataWorks Summit
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2Giovanna Roda
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceDr Ganesh Iyer
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 

What's hot (20)

A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce framework
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)
 
Hadoop
HadoopHadoop
Hadoop
 
MapReduce
MapReduceMapReduce
MapReduce
 
Unit 1
Unit 1Unit 1
Unit 1
 
Introduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopIntroduction to MapReduce and Hadoop
Introduction to MapReduce and Hadoop
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1
 
H04502048051
H04502048051H04502048051
H04502048051
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead end
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at Scale
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 

Similar to KIISE:SIGDB Workshop presentation.

Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataHitoshi Sato
 
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...Facultad de Informática UCM
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersSangjin Han
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeongYousun Jeong
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engineBhuvaneshwaran R
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionSearce Inc
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonHentsū
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Boni Bruno
 

Similar to KIISE:SIGDB Workshop presentation. (20)

Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
 
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacenters
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
try
trytry
try
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Fast Analytics
Fast Analytics Fast Analytics
Fast Analytics
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engine
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in Production
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 

More from Kyong-Ha Lee

SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...Kyong-Ha Lee
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceKyong-Ha Lee
 
좋은 논문 찾기
좋은 논문 찾기좋은 논문 찾기
좋은 논문 찾기Kyong-Ha Lee
 
A poster version of HadoopXML
A poster version of HadoopXMLA poster version of HadoopXML
A poster version of HadoopXMLKyong-Ha Lee
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingKyong-Ha Lee
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingKyong-Ha Lee
 

More from Kyong-Ha Lee (6)

SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
좋은 논문 찾기
좋은 논문 찾기좋은 논문 찾기
좋은 논문 찾기
 
A poster version of HadoopXML
A poster version of HadoopXMLA poster version of HadoopXML
A poster version of HadoopXML
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
 

Recently uploaded

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

KIISE:SIGDB Workshop presentation.

  • 1. MapReduce: Model, Internals, and Its Improvements June 29, 2012 Kyong-Ha Lee bart7449@gmail.com Copyright © KAIST Database Lab. All Rights Reserved.
  • 2. Outline Three topics that I will discuss today : ♦ MapReduce programming model ♦ Anatomy of the MapReduce framework – Basic principles about the MapReduce framework – Hadoop internal architecture – Not much discussion on implementation details, but will be happy to discuss them if there are any questions. ♦ A brief survey on the study of improving the conventional MapReduce framework Copyright © KAIST Database Lab. All Rights Reserved. 2
  • 3. Big Data ♦ “A loosely-defined term used to describe data sets so large and complex that they become awkward to work with using an on-hand database in a single node.” - Wikipedia ♦ Data growth challenges are defined by:* – Increasing volume (amount of data) » e.g. UniProtKB, a protein knowledgebase now hits more than 108GB a file. – Velocity (speed of data in/out) » 6TB of new log data is collected at Facebook everyday *source: A comparison of join algorithms for log processing in MapReduce, SIGMOD’10 – Variety (range of data types, sources) » Multimedia formats, unstructured and semi-structured data. * Doug Laney, “3D Data Management: Controlling Data Volume, Velocity and Variety”, 2001 Copyright © KAIST Database Lab. All Rights Reserved. 3
  • 4. Commodity Clusters ♦ It is hard to store and process big data on a single machine in a timely manner ♦ Standard architecture emerging: – Cluster of commodity Linux nodes – Gigabit Ethernet interconnects – Low-price commodity PCs ♦ How to organize computations on this architecture? – Practical issues such as SW/HW failures Copyright © KAIST Database Lab. All Rights Reserved. 4
  • 5. Fault Tolerance ♦ Cheap nodes fail frequently, if you have many – An average of 1.7 percent of drives fail within a year and more than 8.6 percent of drives fail after three years*. » MTBF for 1 node = 3 years » MTBF for 1000 nodes = 1 day in average – An average of 1.29 percent of nodes suffer from uncorrectable memory errors within a year*. ♦ Putting fault-tolerance into system is necessary ♦ Job posting from data center team at Google * – One of requirements: » “ able to lift/move 20-30 lbs equipment on a daily basis” * Failure trends in a large disk population-FAST’07 * DRAM Errors in the Wild: A Large-Scale Field Study, SIGMETRICS, 2009 * http://www.xing.com/net/datacenter/possible-positions-130745/google-datacenter-roles-berlin-frankfurt-munich-9051768 Copyright © KAIST Database Lab. All Rights Reserved. 5
  • 6. A Simple Cluster Architecture 8 Gbps backbone between racks 1 Gbps between Switch any pair of nodes in a rack Switch Switch CPU CPU CPU CPU Mem … Mem Mem … Mem Disk Disk Disk Disk Information about Yahoo! cluster used for TeraSort: • Approximately 3,800 nodes; each rack contains 40 nodes • 2 quad core Xeons @ 2.5GHz per node, 8GB RAM, 4 SATA HDDs • Redhat Enterprise Linux server R5.1 (Kernel 2.6.18) • Sun JDK 1.6.0 05 and 13 Copyright © KAIST Database Lab. All Rights Reserved. 6
  • 7. The Need of Stable Storages ♦ Distributed File System – Provides global file namespace – Typical usage patterns – Huge files (100s of GB to TB) » Data is rarely updated in place » Reads and appends are common I/O patterns ♦ Google GFS; Hadoop HDFS – Master manages metadata – Data transfers happen directly between clients/chunk servers – Files broken into blocks(typically 64 MB) – Data replication (typically 3 replicas, asynchronous) – Immutable data blocks Copyright © KAIST Database Lab. All Rights Reserved. 7
  • 8. GFS Design Copyright © KAIST Database Lab. All Rights Reserved. 8
  • 9. Principles of Parallel Processing ♦ Large-scale data processing – Googlers want to use 1,000s of CPUs – But don’t want hassle of managing things ♦ They also want their system to provide: – Automatic parallelization & distribution – Fault-tolerance during processing – Efficient I/O scheduling – Monitoring & status updates Copyright © KAIST Database Lab. All Rights Reserved. 9
  • 10. MapReduce ♦ A useful solution for big data processing ♦ Both a programming model and a framework for massive parallel processing of large datasets with many commodity machines – Popularized and controversially patented by Google Inc. – Analogous to Group-By-Aggregation in databases ♦ Easy to distribute a job across nodes – Support of data parallelism ♦ No hassle of managing jobs across nodes – By hiding details of parallel execution and allow users to focus only on data processing strategies ♦ Nice retry/failure semantics ♦ Runtime scheduling with speculative execution Copyright © KAIST Database Lab. All Rights Reserved. 10
  • 11. Its Importance and Impact ♦ “Data center is the computer. If MapReduce is the first instruction of the data center computer, I can’t wait to see the rest of the instruction set, as well as the data center programming language, the data center operating system, the data center storage systems, and more.” - David A. Patterson. Technical perspective: the data center is the computer. CACM, 51(1):105, 2008. ♦ A list of institutions that are using Hadoop, an open-source Java implementation of MapReduce ♦ Its scholastic impact! as of Dec 31, 2011 as of June 28, 2012 Copyright © KAIST Database Lab. All Rights Reserved. 11
  • 12. Usage Statistics at Google Aug ‘04 Mar ‘06 Sep ‘07 Sep ‘09 The number of jobs 29K 171K 2,217K 3,467K Average completion 634 874 395 475 time (secs) Machine years used 217 2,002 11,081 25,562 Input data read(TB) 3,288 52,254 403,152 544,130 Intermediate data(TB) 758 6,743 3,4774 90,120 Output data 193 2,970 14,018 57,520 written(TB) Average worker 157 268 394 488 * machines Design, Lessons, Advices from Building Large Distributed System, Keynote , LADIS 2009. source: J. Dean, Copyright © KAIST Database Lab. All Rights Reserved. 12
  • 13. MapReduce Programming Model ♦ Input: a set of key/value pairs ♦ A user implements two functions: – map(key1, value1)  (key2, value2) – reduce(key2, a list of value2)  (key3, value3) ♦ (key2, value2) is an intermediate key/value pair ♦ Output is the set of (k3,v3) pairs ♦ Many problems can be phrased in this way – But not for all! Copyright © KAIST Database Lab. All Rights Reserved. 13
  • 14. Example : Building Inverted Index map(key, value): // key1: document Id; value1: text of document for each word w in value1: emit(w, docId ) reduce(key, values): // key2: a word w ; value2 : docId initialize a list L for each docId d in values: put d into L emit(w, L) output(word, a sorted list of docIds) Copyright © KAIST Database Lab. All Rights Reserved. 14
  • 15. Map stage in a M/R Job Input Intermediate key-value pairs key-value pairs k v map k v k v map k v k v … … map k v k v Copyright © KAIST Database Lab. All Rights Reserved. 15
  • 16. Reduce Stage in a M/R Job Output Intermediate Key-value groups key-value pairs key-value pairs reduce k v k v v v k v reduce k v k v v k v k v grouping … … … k v k v reduce k v Copyright © KAIST Database Lab. All Rights Reserved. 16
  • 17. Data in a M/R Job ♦ Input and final output are stored on DFS – Scheduler tries to schedule map tasks “close” to physical storage location of input data – Fault tolerance guaranteed by replicas. ♦ Intermediate results are stored on local disks of map and reduce workers for fault tolerance. – Until the next tasks are fully completed. ♦ Outputs of a M/R job often become inputs of another MR job – Consecutive or iterative M/R jobs for a single work » e.g. binary joins and PageRank Copyright © KAIST Database Lab. All Rights Reserved. 17
  • 18. Parallel Execution across Nodes 1. Partition input key/value pairs into blocks and then run map() tasks in parallel 2. After all map()s are complete, consolidate all emitted values for each unique emitted key 3. Now partition space of output map keys, and run reduce() in parallel 4. In reduce(), values for each key are grouped together then aggregated, reduced output are stored on DFS Copyright © KAIST Database Lab. All Rights Reserved. 18
  • 19. Combiner ♦ Often a map task will produce many pairs of the form (k,v1), (k,v2), … for the same key k – e.g., popular words in Word Count ♦ It can save many I/Os by pre-aggregating in mappers – combine(k1, list(v1))  v2 – Usually same as reduce function; in-mapper aggregation ♦ Combiners works correctly only if reduce function is commutative and associative Copyright © KAIST Database Lab. All Rights Reserved. 19
  • 20. System Architecture Input Block 1 Block 2 Block 3 ... Block n Map Local sort Mapper Mapper Mapper Combiner Intermediate result Barrier pull Shuffle/Copy Reduce Merge Reducer Reduce Output Copyright © KAIST Database Lab. All Rights Reserved. 20
  • 21. Sort-Merge in MapReduce Input k v Reduce() k v v k v v k v k v v v Local sort Merge k v k v k v Shuffle k v k v v k v k v Local sort k v k v v k v v v k v Merge Copyright © KAIST Database Lab. All Rights Reserved. 21
  • 22. Data Partitioning ♦ Inputs to map tasks are created by contiguous splits of input file ♦ For reduce, we need to ensure that records with the same intermediate key end up at the same worker ♦ System uses a default partition function e.g., hash(key) mod R ♦ Sometimes useful to override – e.g., hash(hostname(URL)) mod R ensures URLs from a host end up in the same output file Copyright © KAIST Database Lab. All Rights Reserved. 22
  • 23. Fault Tolerance ♦ If tasks fail, the tasks are executed again in another node – Detect failure via periodic heartbeats – Re-execute in-progress map tasks – Re-execute in-progress reduce tasks ♦ If a node crashes: – Re-launch its current tasks on other nodes – Re-run any maps the node previously ran » Necessary because their output files were lost along with the crashed node ♦ If a task is going slowly (straggler): – Launch second copy of task on another node (“speculative execution”) – Take the output of whichever copy finishes first, and kill the other Copyright © KAIST Database Lab. All Rights Reserved. 23
  • 24. Hadoop Distributed File System – An open-source clone of GFS – Write-one and read-many – Data partitioned into 64MB or 128MB blocks, each block replicated 3 times. – NameNode holds filesystem metadata – Files are broken up and spread over the DataNode ♦ Hadoop MapReduce – Java implementation of MapReduce – JobTracker schedules and manages jobs – TaskTracker executes individual map() and reduce() tasks on each node. Copyright © KAIST Database Lab. All Rights Reserved. 24
  • 25. A Brief History of Apache Hadoop Branches & Releases *source: http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/ Copyright © KAIST Database Lab. All Rights Reserved. 25
  • 26. Hadoop 2 alpha : HDFS Federation MapReduce V2 aka. YARN Copyright © KAIST Database Lab. All Rights Reserved. 26
  • 27. Hadoop Ecosystem Copyright © KAIST Database Lab. All Rights Reserved. 27
  • 28. System Behavior on a Single Node *Source: A comparison of join Algorithms for log processing in MR, SIGMOD’10 Copyright © KAIST Database Lab. All Rights Reserved. 28
  • 29. Overall Execution Time *Source: A patform for scalable one-pass analytics using MapReduce, SIGMOD’11 Copyright © KAIST Database Lab. All Rights Reserved. 29
  • 30. Criticism ♦ D. DeWitt and M. Stonebraker badly criticized that “MapReduce is a major step backwards”[5]. – He first regarded it as a simple Extract-Transform-Load tool. ♦ A technical comparison was done by Pavlo et al.[6] – Compared with a commercial row-wise DBMS and Vertica – After that, technical debates btw. researchers vs. practitioners are triggered ♦ CACM welcomed this technical debate, inviting both sides in The Communications of ACM, Jan 2010 [7,8] Copyright © KAIST Database Lab. All Rights Reserved. 30
  • 31. The Technical Debate Copyright © KAIST Database Lab. All Rights Reserved. 31
  • 32. *Source: A Comparison of Approaches to Large-Scale Data Analysis, SIGMOD’09 Copyright © KAIST Database Lab. All Rights Reserved. 32
  • 33. Advantages ♦ Simple and easy to use – Users code only Map() and Reduce() – BashReduce implements MapReduce for standard Unix commands such as sort, awk, grep, join etc. – Users need not to consider how to distribute their job ♦ Flexible – No data model, no schema – Users can treat any irregular or unstructured data with MapReduce ♦ Independent of the underlying storage ♦ Fault tolerance – Users need not to worry about faults during running – Reported that MapReduce can continue to work in spite of an avg. of 1.2 failures per job at Google. ♦ High scalability – Yahoo! reported that Hadoop gear could scale out more than 4,000 nodes in 2008. Copyright © KAIST Database Lab. All Rights Reserved. 33
  • 34. Pitfalls ♦ No high-level language – M/R itself does not support any high-level language li and also query optimization ♦ Schema-free and index-free – Impromptu data processing throws away the benefits of data modeling – Requires data parsing and full scan ♦ A single fixed dataflow – Ease of use with a simple abstraction – Complex algorithms are hard to be phrased in a single M/R jobs. ♦ Low efficiency – Sacrifice of disk I/O for fault-tolerance » Result materialization on local disks in each step -> no pipelining » Three replicas on DFS – Block level restarts; a simple heuristic runtime scheduling with speculative execution ♦ Blocking operators – Caused by merge-sort for grouping values – Reduce()begins after all map tasks end ♦ Very young compared to over 40 years of database – Not mature yet and third-party tools are still relatively few Copyright © KAIST Database Lab. All Rights Reserved. 34
  • 35. A Short List of Improvements ♦ Sacrifice of disk I/O for fault-tolerance ♦ A simple heuristic scheduling /load – Main difference against DBMS balancing – Compression techniques like LZO and – LATE, Leen, SkewTune, … snappy, … ♦ Relatively poor performance – Column-wise: RCFile, CIF, CoHadoop, – Adaptive and automatic performance tuning. DREMEL, . . . – Work sharing/Multiple jobs ♦ A single fixed dataflow » MRShare, ReStore – Dryad, SCOPE, Nephele/PACT, .. » Hive, Pig Latin – Map-Reduce-Merge for binary operators, » fair/capacity sharing, ParaTimer Map-Join-Reduce and some join techniques ♦ Cowork with other tools – Twister & HaLoop for iterative workload – SQL/MapReduce, HadoopDB, Teradata – Map-Join-Reduce & Join algorithms in EDW’s Hadoop integration, Oracle in- MapReduce database Hadoop, … ♦ No schema ♦ DW based on MR – Protocol buffer, JSON, Avro, …. – Cheetah, Osprey, Tenzing, RICARDO,… ♦ No indexing ♦ Optimization – HadoopDB, Hadoop++, Trojan, … – Self-tuning, multi query processing, query ♦ No high-level language optimization – Hive, Sawzall, SCOPE, Pig Latin, … , Jaql, ♦ Other complements Dryad/LINQ ♦ Blocking operators – MapReduce Online, Mortar, MR-Hash, … 35
  • 36. Data Skewness in M/R ♦ When skew arises, some partitions of an operation take longer to process, slowing down the entire computation. ♦ Type – (1) skew caused by an uneven distribution of input data to operator partitions (or tasks) and – (2) skew caused by some portions of the input data taking longer to process *A timing chart of a M/R job running the than others. PageRank algorithm ♦ Map-skew/reduce-skew from Cloud 9 *A Study of Skew in MapReduce Applications, Copyright © KAIST Database Lab. All Rights Reserved. OpenCirrus’11 36
  • 37. Simple Solution for reduce-skew ♦ Assigning ReduceIds to mapped outputs as keys according to statistics acquired by sampling or scanning input data – For skewed data, assign more ReduceIds for repartition ♦ Pros and cons – Simple, no modification of M/R internals – But, impose additional overhead, i.e. the 1st M/R job Simple Map Reduce delivery Map Reduce Mapped (assign reduceId) output Shuffle Statistics by ReduceId Copyright © KAIST Database Lab. All Rights Reserved. 37
  • 38. SkewTune ♦ Key challenges – Requires no extra input from the user yet work for all M/R implementation – Impose minimal overhead if there is no skew. ♦ Detecting and mitigating skew at runtime – When a node becomes idle, it identifies the task with the greatest expected remaining processing time. – Unprocessed input data of the straggling task is then proactively repartitioned. Copyright © KAIST Database Lab. All Rights Reserved. 38
  • 39. Schema Support ♦ MapReduce parses each data record at reading time – Causing performance degradation ♦ Serialization formats are rather used. – Google’s protocol buffers, JSON, Apache’s thrift and avro. ♦ Compression techniques are sometimes used together to reduce I/O costs. – LZO, snappy, … – Fast compression/decompression is sometimes important than compression ratio. – Data processing even with compressed data, e.g. run-length encoding. Copyright © KAIST Database Lab. All Rights Reserved. 39
  • 40. HadoopDB ♦ A hybrid system that connects multiple single site DBMS with M/R ♦ Combining M/R-style scalability and the performance of DBMS ♦ It utilizes M/R as a distributing system – Queries are written in SQL and distributed via M/R across nodes. – Data processing is boosted by the features of DBMS engines as much as possible. Copyright © KAIST Database Lab. All Rights Reserved. 40
  • 41. Twister and HaLoop ♦ Copyright © KAIST Database Lab. All Rights Reserved. 41
  • 42. Dryad ♦ Allows to design and execute a dataflow graph as users’ wish ♦ A form of DAG – Vertex: a program that can run when all inputs are ready – Channel: either of file, TCP pipe, or shared memory, .. ♦ A logical dataflow graph is mapped onto physical resources by a job scheduler at runtime. Copyright © KAIST Database Lab. All Rights Reserved. 42
  • 43. MapReduce Online ♦ To address an issue that pull-based communication and checkpoints of mapped outputs limit pipelined processing. ♦ Support online aggregation and continuous queries in MapReduce ♦ Mappers push their data temporarily stored into reducers periodically in the same M/R job. ♦ Enables approximate answers and stream processing – Can also reduce the response times of jobs Read Input File map reduce Block 1 HDFS HDFS Block 2 map reduce Write Snapshot Copyright © KAIST Database Lab. All Rights Reserved. Answer 43
  • 44. Hash-based Partitioning ♦ Merge sort affects both the performance and the nature of M/R ♦ As soon as each map task outputs its intermediate results, the results are hashed and pushed to hash tables held by reducers. ♦ Reducers perform aggregation on the value in each bucket. ♦ No grouping is required ♦ On-the-fly aggregation even when all mappers are not completed yet. Copyright © KAIST Database Lab. All Rights Reserved. 44
  • 45. Column-wise Storages ♦ Column-oriented storage formats – read-optimized by avoiding unnecessary column reads during table scans – It outperforms the other structures in most cases. – RCFile uses column-wise compression and thus provides efficient storage space utilization. ♦ Systems – Google’s BigTable, Dremel – Facebook’s Record Columnar File(RCFIle), CIF, Llama, . . . CIF Copyright © KAIST Database Lab. All Rights Reserved. RCFile 45
  • 46. Block Collocation Copyright © KAIST Database Lab. All Rights Reserved. 46
  • 47. Sharing Multiple Query Executions in M/R ♦ Inspired by multi query optimization techniques in database ♦ three sharing opportunities across multiple MR jobs: – scan sharing, mapped outputs sharing, and Map function sharing ♦ Intermediate result sharing improves the execution time significantly. ♦ Sharing all scans may yield poor performance because of the cost of merge-sort. – No scan sharing, |D| * n for n M/R jobs with |D|-size input – With scan sharing, n*|D|* log(n*|D|), compared to |D|log|D| * n Copyright © KAIST Database Lab. All Rights Reserved. 47
  • 48. Reusing Results of M/R jobs ♦ The current practice is to delete intermediate results from DFS ♦ more useful if these intermediate results can be stored and reused in future M/R jobs. ♦ ReStore matches input workflows of M/R jobs with previously executed jobs and rewrites the workflows to reuse the stored results of the matched jobs. Copyright © KAIST Database Lab. All Rights Reserved. 48
  • 49. Research Challenges and Issues ♦ Parallelizing conventional algorithms – Finding algorithms easy to be phrased in M/R jobs. – Or, develop algorithms for that support. – Not good for ad-hoc queries ♦ Performance Improvements – Not so well utilize the modern HW features » Multi-core, GPGPU, SSD, etc – Some caveats still exist in the model » e.g. iterative and incremental processing – Self-tuning » 150+ tuning knobs in Hadoop » Long-running analysis and batch processing – Real-time MapReduce Copyright © KAIST Database Lab. All Rights Reserved. 49
  • 50. Summary ♦ M/R is simple, but provides good scalability and fault- tolerance for massive data processing ♦ M/R does not substitute DBMS ♦ M/R complements DBMS with scalable and flexible parallel processing for various data analysis ♦ I/O efficiency of MapReduce still needs to be addressed for more successful implications – sort-merge based grouping and frequent checkpoints ♦ Many application domains and room for improvements Copyright © KAIST Database Lab. All Rights Reserved. 50
  • 51. Thank you! Questions or comments? Copyright © KAIST Database Lab. All Rights Reserved. 51
  • 52. References 1. David A. Patterson. Technical perspective: the data center is the computer. Communications of ACM, 51(1):105, 2008. 2. Hadoop. users List; http://wiki.apache.org/hadoop/PoweredBy 3. Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data, Processing on Large Clusters, In Proceedings of OSDI 2004 and CACM Vol. 51, No. 1 pp. 107-113, 2008 4. S. Ghemawat and et al. The Google File System, ACM SIGOPS Operating Systems Review, Vol. 37, No. 5 pp. 29- 43, 2003 5. David J. DeWitt and Michael Stonebraker, MapReduce: a major step backwards, Database column blog, 2008 6. Andrew Pavlo and et al. A Comparison of Approaches to Large-Scale Data Analysis, In Proceedings of SIGMOD 2009 7. Michael Stonebraker and et al. MapReduce and Parallel DBMSs: Friends or Foes?, Communications of ACM, Vol 53, No. 1 pp. 64-71, Jan 2010 8. Jeffrey Dean and Sanjay Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, Vol. 53, No. 1 pp. 72-72 Jan 2010 9. M. Stonebraker, The case for shared-nothing. Data Engineering Bulletine, 9(1):4-9, 1986 10. D. DeWitt and J. Gray, Parallel database systems: the future of high performance database systems, Communications of the ACM 35(6):85-98, 1992 11. B. Schroeder and et a. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST), pages 1–16, 2007. 12. B. Schroeder and et al. DRAM errors in the wild: a large-scale field study. In Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems, pages 193–204. ACM New York, NY, USA, 2009 Copyright © KAIST Database Lab. All Rights Reserved. 52
  • 53. 13. G.M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, spring joint computer conference, pages 483–485. ACM, 1967. 14. J.L. Gustafson. Reevaluating Amdahl’s law. Communications of the ACM, 31(5):532–533, 1988. 15. A.H. Karp and H.P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539–543, 1990. 16. Apache Foundation, MapReduce V0.21.0 Tutorial, http://hadoop.apache.org/mapreduce/docs/r0.21.0/mapred_tutorial.html, 2010 17. Incremental MapReduce, TV’s cobweb blog, http://eagain.net/articles/incremental-mapreduce/ 18. Y. Bu and et al. HaLoop: Efficient Iterative Data Processing on Large Clusters, In Proceedings of VLDB’10 19. J. Ekanayake and et al. Twister: A Runtime for Iterative MapReduce, In Proceedings of ACM HPDC’10 pp. 810-818, 2010 20. M. Isard and et al. Dryad: Distributed data-parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, page 72. ACM, 2007. 21. R. Chaiken and et al. Scope: easy and efficient parallel processing of massive data sets. PVLDB: Proceedings of Very Large Data Base Endowment, 1(2):1265–1276, 2008. 22. C. Olston and et al. Pig Latin: a not-so-foreign language for data processing. In SIGMOD ’08: Proceedings of ACM SIGMOD Conference, pages 1099–1110, 2008. 23. A. Gates and et al. Building a high level dataflow system on top of MapReduce: The pig experience. PVLDB: In Proceedings of VLDB, 2(2):1414–1425, 2009. 24. R. Pike and et al. Interpreting the Data: Parallel Analysis with Sawzall, Scientific Programming, Vol. 13 No. 4, pp. 277-298, 2005 25. A. Thusoo and et al. Hive- A Warehousing Solution over a Map-Reduce Framework. PVLDB: Proceedings of Very Large Data Base Endowment, 2009 26. A. Thusoo and et al. Hive - a petabyte scale data warehouse using hadoop. In Proceedings of ICDE 2010 Copyright © KAIST Database Lab. All Rights Reserved. 53
  • 54. 27. Y. Yu and et al. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI ’08: Proceedings of Symposium on Operating System Design and Implementation, 2008 28. M. Isard and et al. Distributed Data-Parallel Computing Using a High-Level Programming Language, In Proceedings of SIGMOD 2009 29. D. Logothetis and et al. Ad-Hoc Data Processing in the Cloud, In Proceedings of VLDB’08 30. T. Condie and et al. MapReduce Online, In Proceedings of USENIX NSDI, 2010 31. A. Alexandrov and et al. Massively Parallel Data Analysis with PACTs on Nephele, In Proceedings of VLDB Vol. 3 No.2, 2010 32. Battr{'e}, D and et al. Nephele/PACTs: a programming model and execution framework for web-scale analytical processing, In Proceedings of SoCC 2010 33. Eric Friedman and et al. SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user defined functions. PVLDB: PVLDB: Proceedings of Very Large Data Base Endowment, 2(2):1402–1413, 2009. 34. A. Abouzeid and et al. HadoopDB: An architectural hybrid of mapreduce and dbms technologies for analytical workloads. VLDB’09: Proceedings of Very Large Data Base Endowment, pages 1084–1095, 2009. 35. Y. Xu and et al. Integrating Hadoop and Parallel DBMS, In Proceedings of ACM SIGMOD, pp. 969-974, 2010 36. S. Das and et al. Ricardo: Integrating R and Hadoop, In Proceedings of ACM SIGMOD pp. 987-998, 2010 37. J. Dittrich and et al. Hadoop++ Making a Yellow Elephant Run like a Cheetah (Without it Even Noticing), In Proceedings of VLDB’10 38. S. Chen, Cheetah: A High Performance Custom Data Warehouse on top of MapReduce, In Proceedings of VLDB, Vol. 3, No. 2, 2010 39. S. Melnik and et al. Dremel: Interactive Analysis of Web-Scale Datasets, In Proceedings of VLDB VOl 3. No .1, 2010 Copyright © KAIST Database Lab. All Rights Reserved. 54
  • 55. 40. C. Yang and et al. Osprey-Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Databasem, In Proceedings of IEEE ICDE pp. 657-668, 2010 41. M. Zaharia and et al. Improving MapReduce Performance in Heterogeneous Environments, In Proceedings of USENIX OSDI’08 42. H. Yang, and et al., Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters, In Proceedings of SIGMOD’07 43. D. Jiang and et al. Map-Join-Reduce: Towards Scalable and Efficient Data Analysis on Large Clusters, IEEE Transactions on Knowledge and Data Engineering, preprint 44. S. Blanas and et al. A Comparison of Join Algorithms for Log Processing in MapReduce, In Proceedings of SIGMOD’10 45. F. N. Afrati and et al. Optimizing Joins in a Map-Reduce Environment, in Proceedings of EDBT 2010 46. R. Vernica and et al. Efficient Parallel Set-Similarity Joins Using MapReduce, In Proceedings of SIGMOD’10 47. T. Nykiel and et al. MRShare: Sharing Across Multiple Queries in MapReduce, In Proceedings of VLDB’10 48. K. Morton and et al. Estimating the progress of MapReduce Pipelines, In Proceedings of IEEE ICDE pp. 681-684, 2010 49. K. Morton and et al. ParaTimer: A Progress Indicator for MapReduce DAGs, In Proceedings of ACM SIGMOD, pp. 507-518, 2010 50. S. Papadimitriou and et al. DisCo: Distributed Co-clustering with Map-Reduce, In Proceedings of IEEE ICDM pp. 512-521, 2009 51. C. Wang and et al. MapDupReducer : detecting near duplicates over massive datasets, In Proceedings of ACM SIGMOD pp. 1119-1122, 2010 52. S. Babu, Towards Automatic Optimization of MapReduce Programs, In Proceedings of ACM SoCC’10 53. D. Jiang and et al. The Performance of MapReduce: An In-depth Study, In Proceedings of VLDB’10 54. E. Jahani and et al. Automatic Optimization for MapReduce Programs, Proceedings of VLDB Vol.4, No. 6 , 2011 Copyright © KAIST Database Lab. All Rights Reserved. 55
  • 56. 55. B. Catanzaro and et al. A Map Reduce Framework for Programming Graphic Processors, In Proceedings of Workshop on Software Tools for Multicore Systems, 2008 56. B. He and et al. Mars: A MapReduce framework on graphic processors, In Proceedings of PACT’10 pp. 260-269, 2008 57. W. Jiang and et al. A Map-Reduce System with an Alternate API for Multi-Core Environments, In Proceedings of 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, 2010 58. Jeff Dean, Design, Lessons, Advices from Building Large Distributed System, Keynote , LADIS 2009. 59. Willis Lang and et al. Energy Management for MapReduce Clusters, Proceedings of VLDB Vol. 3 No. 1, 2010 60. W. Xiong and et al. Energy Efficient Data Intensive Distributed Computing, Data Engineering Bulletin Vol. 34, No. 1, pp. 24-33, March 2011 61. E. Anderson and et al. Efficiency Matters!, ACM SIGOPS Operating Systems Review, 44(1):40-45, 2010 62. Jimmy Lin and Chris Dyer, Data-Intensive Text Processing, Book 63. G. Malewicz and et al. Pregel: A System for Large-Scale Graph Processing, In Proceedings of PODC’09 64. J. Ekanayake and et al. MapReduce for Data Intensive Scientific Analyses, In Proceedings of IEEE eScience’08 65. K. B. Hall and et al. MapReduce/BigTable for Distributed Optimization , NIPS LCCC Workshop 2010 66. PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce 67. MC Schatz, CloudBurst: Highly sensitive read mapping with MapReduce, Bioinformatics, Vol 25, No. 11 68. B. Fan and et al. DiskReduce: RAID for data-intensive scalable computing, In Proceedings of the 4th Annual workshop on Petascale Data Storage, pp. 6-10, 2009 69. K. Lee and et al. Parallel data processing with MapReduce: a survey, The SIGMOD Record, Vol 40, No. 4, pp.11-20, 2011 Copyright © KAIST Database Lab. All Rights Reserved. 56