SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Architects)view)of)Hadoop)I/O)

      I/O)analysis)using)vProbes)
                   )
             Richard)McDougall)
                    V1.0))
                 April)2012)
Architect’s)QuesFons)
•  Does)Hadoop)really)need)compute)+)data)
   local)
•  How)much)and)what)I/O)rates)of)ephemeral)
   data)do)we)need)to)design)for?)
•  What)I/O)paKerns)do)we)need)to)support)
   HDFS?)
•  What)is)the)I/O)paKern)of)MNR)tasks)
•  Are)there)opportuniFes)for)caching)–)map)
   input,)output)or)ephemeral?)
Controlled)Small)Study)
•    Focus)on)developing)tooling)
•    Using)vProbes)+)Perl)+)R)
•    Hadoop)0.20.204)
•    Terasort)@)1GB)
•    One)Namenode,)Tasktracker,)Datanode)
Terasort)
                        Map)Task)

                        Map)Task)
                                                              Reduce)          Output)File)
Input)File)                                  Shuffle)           (Sort))
                        Map)Task)

                        Map)Task)


              Input)
              Splits)     Sort)Chunk)of)   Shuffle)output)
                                                           Combine)and)Sort)
              (x16))      Of)KeyNValues)   To)Reducers)
Log)of)the)sort)‘Job’)
$ log.pl job_201201261301_0005_1327649126255_rmc_TeraSort !
            Item   Time Jobname            Taskname Phase Start-Time End-Time Elapsed         !
             Job   0.000 201201261301_0005 !
             Job         201201261301_0005 !
             Job   0.475 201201261301_0005 PREP !
            Task   1.932 201201261301_0005 m_000017 SETUP !
      MapAttempt   3.066 201201261301_0005 m_000017 SETUP !
      MapAttempt  10.409 201201261301_0005 m_000017 SETUP SUCCESS 1.932 10.409 8.477 "setup"!
            Task  10.966 201201261301_0005 m_000017 SETUP SUCCESS 1.932 10.966 9.034 !
             Job         201201261301_0005 RUNNING !
            Task  10.970 201201261301_0005 m_000000 MAP !
            Task  10.972 201201261301_0005 m_000001 MAP !
      MapAttempt  10.981 201201261301_0005 m_000000 MAP !
      MapAttempt  65.819 201201261301_0005 m_000000 MAP SUCCESS 10.970 65.819 54.849 ""!
            Task  68.063 201201261301_0005 m_000000 MAP SUCCESS 10.970 68.063 57.093 !
      MapAttempt  10.998 201201261301_0005 m_000001 MAP !
      MapAttempt  65.363 201201261301_0005 m_000001 MAP SUCCESS 10.972 65.363 54.391 ""!
            Task  68.065 201201261301_0005 m_000001 MAP SUCCESS 10.972 68.065 57.093 !
            Task  68.066 201201261301_0005 m_000002 MAP !
            Task  68.067 201201261301_0005 m_000003 MAP !
            Task  68.068 201201261301_0005 r_000000 REDUCE !
      MapAttempt  68.075 201201261301_0005 m_000002 MAP !
      MapAttempt 139.789 201201261301_0005 m_000002 MAP SUCCESS 68.066 139.789 71.723 ""!
            Task 140.193 201201261301_0005 m_000002 MAP SUCCESS 68.066 140.193 72.127 !
      MapAttempt  68.076 201201261301_0005 m_000003 MAP !
      MapAttempt 139.927 201201261301_0005 m_000003 MAP SUCCESS 68.067 139.927 71.860 ""!
            Task 140.198 201201261301_0005 m_000003 MAP SUCCESS 68.067 140.198 72.131 !
…!
   ReduceAttempt  68.112 201201261301_0005 r_000000 REDUCE !
   ReduceAttempt 795.299 201201261301_0005 r_000000 REDUCE SUCCESS 68.068 795.299 727.231 "reduce > reduce"!
            Task 798.223 201201261301_0005 r_000000 REDUCE SUCCESS 68.068 798.223 730.155 !
            Task 798.226 201201261301_0005 m_000016 CLEANUP !
      MapAttempt 798.241 201201261301_0005 m_000016 CLEANUP !
      MapAttempt 806.113 201201261301_0005 m_000016 CLEANUP SUCCESS 798.226 806.113 7.887 "cleanup"!
            Task 807.252 201201261301_0005 m_000016 CLEANUP SUCCESS 798.226 807.252 9.026 !
             Job 807.253 201201261301_0005 SUCCESS 0.000 807.253 807.253 !
Terasort:)Map)and)Reduce)Phases)
    Setup)Map)   Elapsed)Time)N)Seconds)




                 Mappers)




                                            Reducer)


                                           Cleanup)Map)
Terasort:)Map)and)Reduce)Phases)
    Setup)Map)     Elapsed)Time)N)Seconds)
                 Zoom)in)
                   on)
                 Map)Task)
                   I/O)

                   Mappers)




                                         Zoom)in)
                                           on)
                                         Reduce)
                                         Task)I/O)
                                                     Reducer)

                                              Cleanup)Map)
VMware)vProbes)
•    Dynamic)
     InstrumentaFon)


•    Probe)mulFple)
     VMs)


•    Probe)
     VirtualizaFon)
     Layer)


•    VMware)Fusion)
     and)WorkstaFon)
vProbes)

GUEST:ENTER:system_call {!
    string path;!
    comm = curprocname();!
    tid = curtid();!
    pid = curpid();!
    ppid = curppid();!
    syscall_num = sysnum;!
!
    if(syscall_num == NR_open) {!
     !path = guestloadstr(sys_arg0);!
       syscall_name = "open";!
       sprintf(syscall_args, ""%s", %x, %x", path, sys_arg1, sys_arg2); !
    …!
}!
!
GUEST:OFFSET:ret_from_sys_call:0 {!
     !printf("%s/%d/%d/%d %s(%s) = %d <0>n", comm, pid, rtid, ppid, syscall_name,!
                                               syscall_args, getgpr(REG_RAX));                        !
}!
!
!
java/14774/15467/1 open("/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta", 0, 1b6) = 144 <0>!
java/14774/15467/1 stat("/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta", 7f0b80a4e590) = 0
<0>!
java/14774/15467/1 read(144, 7f0b80a4c470, 4096) = 167 <0>!
!
Pathname)ResoluFon)
filetracevp.pl: !
!
if ($syscall =~ m/open/) {!
                $path1 = $line;!
                $path1 =~ s/[A-z/0-9]+[ ]+[a-z]+("([^"]+)".*n/1/;!
                $fd1 = $line;!
                if ($fd1 =~ s/.* ([0-9]+) <.*>n/1/) {!
                        $fds{$pid,$fd1} = $path1;!
!
if ($syscall =~ m/write/) {!
                $params = $line;!
                if ($params =~ s/^[A-z/0-9]+[ ]+[a-z]+(([0-9]+),.* ([0-9]+)) = ([0-9]+) <(.*)>n/1,2,3,4/) {!
                        ($fd1, $size, $bytes, $lat) = split(',', $params);!
                        $path1 = $fds{$pid, $fd1};!
…!
!
!
java,14774,15467,,open,0,0,0,0,144,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,!
java,14774,15467,,stat,0,0,0,0,0,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,!
java,14774,15467,,read,4096,167,0,0,144,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,!
!
!
!
!
Controlled)SmallNScale)Study)
  $ hadoop jar hadoop-examples-0.20.204.0.jar teragen 10000000 teradata!
  <begin trace>!
  $ hadoop jar hadoop-examples-0.20.204.0.jar terasort teradata teraout!
  !
Job Counters !                                        Hadoop)Distro)                                                            236)
      Launched reduce tasks=1!                        Hadoop)Logs)                                                              132)
      SLOTS_MILLIS_MAPS=1146887!
                                                      Hadoop)clienKmp)unjar)                                                      1)
      Launched map tasks=16!
      Data-local map tasks=16!                        Mappers)files)jobcache)N)spills)                                          1753)
      SLOTS_MILLIS_REDUCES=766823!                    Mappers)files)jobcache)N)output)                                          1777)
    File Input Format Counters !                      Reducer)Intermediate)                                                     764)
      Bytes Read=1000057358!
                                                      Reducers)Shuffle)and)Intermediate)                                         1744)
    File Output Format Counters !
      Bytes Written=1000000000!                       Jobcache)class)files)and)shell)scripts)                                      1)
    FileSystemCounters!                               Hadoop)Datanode)                                                         1690)
      FILE_BYTES_READ=2382257412!                     JVM)N)/usr/lib/jvm…)                                                       98)
      HDFS_BYTES_READ=1000059070!
                                                                                        Total&MB&                              7987&
      FILE_BYTES_WRITTEN=3402627838!
      HDFS_BYTES_WRITTEN=1000000000!
                                                                 JVM)N)/usr/lib/jvm…)
    Map-Reduce Framework!
      Map output materialized bytes=1020000096!                    Hadoop)Datanode)
      Map input records=10000000!
                                                 Jobcache)class)files)and)shell)scripts)
      Reduce shuffle bytes=1020000096!
      Spilled Records=33355441!                     Reducers)files)jobcache)N)output)
      Map output bytes=1000000000!                        Reducer)intermediate)file)
      Map input bytes=1000000000!
      Combine input records=0!                  Mappers)files)jobcache)N)map)output)
      SPLIT_RAW_BYTES=1712!                           Mappers)files)jobcache)N)spills)
      Reduce input records=10000000!
      Reduce input groups=10000000!                         Hadoop)clienKmp)unjar)
      Combine output records=0!                                         Hadoop)Logs)
      Reduce output records=10000000!
                                                                      Hadoop)Distro)
      Map output records=10000000!
                                                                                   0) 200) 400) 600) 800)1000)1200)1400)1600)1800)2000)
Hadoop)I/O)Model)
                         (With)some)data)from)early)observaFons))


                    Map)Task)
                                                    Reduce)
                    Map)Task)
Job)                                    Map)        Reduce)               Sort)
                    Map)Task)           Output)
                                        file.out*
                                                                    Spills)
                    Map)Task)

  DFS)
                    Spills)
                    &)Logs)
                                  )         Shuffle)
                                            Map_*.out*
  Input)
  Data)
                    spill*.out*   75%)of)             Combine)                        DFS)
                                                      Intermediate.out*               Output)
       )                          Disk)Bandwidth)                                 )   Data)
       12%)of)                                                                    12%)of)
       Bandwidth)                                                                 Bandwidth)
                                      HDFS)


12)
One)Mapper)Task:)Temp)Data)
path                                                                                                                                        bytes
/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/file.out       67586124
/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill1.out     52762519
/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill0.out     52508540
/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill2.out     29698564
/usr/lib/jvm/javaD6Dopenjdk/jre/lib/rt.jar                                                                                                     5057763
/home/rmc/untars/hadoopD0.20.204.0/hadoopDcoreD0.20.204.0.jar                                                                                   895582
/home/rmc/untars/hadoopD0.20.204.0/lib/log4jD1.2.15.jar                                                                                           82522
/home/rmc/untars/hadoopD0.20.204.0/lib/commonsDlangD2.4.jar                                                                                       70477
/home/rmc/untars/hadoopD0.20.204.0/lib/commonsDconfigurationD1.6.jar                                                                              61007
/usr/lib/x86_64DlinuxDgnu/gconv/gconvDmodules                                                                                                     51772
/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/job.xml                                                        44420
/home/rmc/untars/hadoopD0.20.204.0/lib/commonsDcollectionsD3.2.1.jar                                                                              29974
/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/job.xml                   21695
/usr/lib/jvm/javaD6Dopenjdk/jre/lib/amd64/libnio.so                                                                                               15946
/home/rmc/untars/hadoopD0.20.204.0/conf/coreDsite.xml                                                                                             11024
/usr/lib/jvm/javaD6Dopenjdk/jre/lib/security/java.security                                                                                        10081
/proc/self/maps                                                                                                                                    7523
Number of I/Os




                                                       0
                                                                              10000
                                                                                                 20000
                                                                                                                    30000
                                                                                                                                                        40000
                                                                                                                                                                         50000
                                                                                                                                                                                            60000
                                                1
                                                2
                                                4
                                                8
                                               16
                                               32
                                               64
                                             128
                                             256
                                             512
                                             1024




                           I/O Size Bucket
                                             2048
                                             4096
                                             8192
                                       16384
                                       32768
                                       65536
                               131072
                                                                                                                                                                                                                       I/O)measured)at)syscall)




                                                                                                                                                                                 Number of I/Os
                                                                   Number of I/Os
                                                                                                                                                    0
                                                                                                                                                                5000
                                                                                                                                                                       10000
                                                                                                                                                                                    15000
                                                                                                                                                                                               20000
                                                                                                                                                                                                       25000
                                                                                                                                                                                                               30000




                                   0
                                                5000
                                                           10000
                                                                      15000
                                                                                      20000
                                                                                              25000
                                                                                                         30000




                                                                                                                                                1
                           1                                                                                                                    2
                           2
                                                                                                                                                4
                           4
                                                                                                                                                8
                           8
                                                                                                                                               16
                          16
                                                                                                                                               32
                          32
                          64                                                                                                                   64

                        128                                                                                                                  128
                        256                                                                                                                  256
                        512                                                                                                                  512
                        1024                                                                                                                 1024
Write I/O Size Bucket




                        2048
                                                                                                                      Read I/O Size Bucket




                                                                                                                                             2048
                        4096
                                                                                                                                             4096
                        8192
                                                                                                                                             8192
                 16384
                                                                                                                                      16384
                                                                                                                                                                                                                                                  One)Mapper)Task:)Temp)I/O)Counts)




                 32768
                                                                                                                                      32768
                 65536
     131072                                                                                                                           65536
                                                                                                                           131072
One)Mapper)Task:)Tmp)Bytes)Transferred)

2.5e+08
                                                                                                                           6e+07




2.0e+08                                                                                                                    5e+07



                                                                                                                           4e+07
1.5e+08




                                                                                                                        Bytes
Bytes




                                                                                                                           3e+07

1.0e+08

                                                                                                                           2e+07


5.0e+07
                                                                                                                           1e+07



0.0e+00                                                                                                                    0e+00

                                                                                                                                           1
                                                                                                                                           2
                                                                                                                                           4
                                                                                                                                           8
                                                                                                                                          16
                                                                                                                                          32
                                                                                                                                          64
                                                                                                                                         128
                                                                                                                                         256
                                                                                                                                         512
                                                                                                                                        1024
                                                                                                                                        2048
                                                                                                                                        4096
                                                                                                                                        8192
                                                                                                                                       16384
                                                                                                                                       32768
                                                                                                                                       65536
                                                                                                                                      131072
                                                                                                                                      262144
                                                                                                                                      524288
                                                                                                                                     1048576
                                                                                                                                     2097152
                                                                                                                                     4194304
                                                                                                                                     8388608
                                                                                                                                    16777216
                                                                                                                                    33554432
                                                                                                                                    67108864
                                                                                                                                   134217728
          1
              2
                  4
                      8
                          16
                               32
                                    64
                                         128
                                               256
                                                     512
                                                           1024
                                                                  2048
                                                                         4096
                                                                                8192
                                                                                       16384
                                                                                               32768
                                                                                                       65536
                                                                                                               131072




                                         I/O Size Bucket                                                                                          I/O Size Bucket


              I/O)measured)at)syscall)                                                                                             Logical)I/O)(sequenFal)grouping)of)syscalls))
Reducer)Task:)Temp)Data)
Number of I/Os




                                                                0e+00
                                                                                         1e+05
                                                                                                             2e+05
                                                                                                                                                       3e+05
                                                                                                                                                                                    4e+05
                                                        1
                                                        2
                                                        4
                                                        8
                                                       16
                                                       32
                                                       64
                                                     128
                                                     256
                                                     512
                                                     1024




                                   I/O Size Bucket
                                                     2048
                                                     4096
                                                     8192
                                               16384
                                               32768
                                               65536
                                      131072
                                                                                                                                                                                                                   I/O)measured)at)syscall)




                                                                        Number of I/Os                                                                                  Number of I/Os
                                                                                                                                                   0
                                                                                                                                                       50000
                                                                                                                                                               100000
                                                                                                                                                                           150000
                                                                                                                                                                                            200000
                                                                                                                                                                                                     250000
                                                                                                                                                                                                              300000




                               0
                                                        20000
                                                                        40000
                                                                                     60000
                                                                                                 80000




                           1                                                                                                                   1

                           2                                                                                                                   2

                           4                                                                                                                   4

                           8                                                                                                                   8

                          16                                                                                                                  16
                                                                                                                                              32
                          32
                                                                                                                                              64
                          64
                                                                                                                                            128
                        128
                                                                                                                                            256
                        256
                                                                                                                                            512
                        512
                                                                                                                                            1024
                        1024
                                                                                                                                                                                                                                              Reducer)Task:)Temp)I/O)Counts)




                                                                                                                     Read I/O Size Bucket




                                                                                                                                            2048
Write I/O Size Bucket




                        2048
                                                                                                                                            4096
                        4096
                                                                                                                                            8192
                        8192
                                                                                                                                     16384
                 16384
                                                                                                                                     32768
                 32768
                                                                                                                                     65536
                 65536
                                                                                                                          131072
     131072
Reducer)Task:)Tmp)Bytes)Transferred)

1.5e+09                                                                                                                    5e+08




                                                                                                                           4e+08


1.0e+09
Bytes




                                                                                                                           3e+08




                                                                                                                        Bytes
                                                                                                                           2e+08
5.0e+08


                                                                                                                           1e+08



0.0e+00                                                                                                                    0e+00
          1
              2
                  4
                      8
                          16
                               32
                                    64
                                         128
                                               256
                                                     512
                                                           1024
                                                                  2048
                                                                         4096
                                                                                8192
                                                                                       16384
                                                                                               32768
                                                                                                       65536
                                                                                                               131072




                                                                                                                                         1
                                                                                                                                         2
                                                                                                                                         4
                                                                                                                                         8
                                                                                                                                        16
                                                                                                                                        32
                                                                                                                                        64
                                                                                                                                       128
                                                                                                                                       256
                                                                                                                                       512
                                                                                                                                      1024
                                                                                                                                      2048
                                                                                                                                      4096
                                                                                                                                      8192
                                                                                                                                     16384
                                                                                                                                     32768
                                                                                                                                     65536
                                                                                                                                    131072
                                                                                                                                    262144
                                                                                                                                    524288
                                                                                                                                   1048576
                                                                                                                                   2097152
                                                                                                                                   4194304
                                                                                                                                   8388608
                                         I/O Size Bucket                                                                                     I/O Size Bucket


              I/O)measured)at)syscall)                                                                                             Logical)I/O)(sequenFal)grouping)of)syscalls))
Datanode)–)Bytes)Transferred)
           5e+08                                                                                                                 7e+08



                                                                                                                                 6e+08



                                                                                                                                 5e+08
1.5e+09
                                                                                                                                 4e+08




                                                                                                                              Bytes
           4e+08
                                                                                                                                 3e+08



                                                                                                                                 2e+08



                                                                                                                                 1e+08
1.0e+09
Bytes




           3e+08
        Bytes




                                                                                                                                 0e+00




                                                                                                                                         1
                                                                                                                                             2
                                                                                                                                                 4
                                                                                                                                                     8
                                                                                                                                                         16
                                                                                                                                                              32
                                                                                                                                                                   64
                                                                                                                                                                        128
                                                                                                                                                                              256
                                                                                                                                                                                    512
                                                                                                                                                                                           1024
                                                                                                                                                                                                   2048
                                                                                                                                                                                                           4096
                                                                                                                                                                                                                   8192
                                                                                                                                                                                                                           16384
                                                                                                                                                                                                                                    32768
                                                                                                                                                                                                                                             65536
                                                                                                                                                                                                                                                       131072
                                                                                                                                                                   Read I/O Size Bucket

                                                                                                                                 1e+09




5.0e+08                                                                                                                          8e+08

           2e+08
                                                                                                                                 6e+08




                                                                                                                              Bytes
                                                                                                                                 4e+08



      1e+08
0.0e+00
                                                                                                                                 2e+08
                1
                    2
                        4
                            8
                                16
                                     32
                                          64
                                               128
                                                     256
                                                           512
                                                                 1024
                                                                        2048
                                                                               4096
                                                                                      8192
                                                                                             16384
                                                                                                     32768
                                                                                                             65536
                                                                                                                     131072




                                                                                                                                 0e+00
                                               I/O Size Bucket
                                                                                                                                         1
                                                                                                                                             2
                                                                                                                                                 4
                                                                                                                                                     8
                                                                                                                                                         16
                                                                                                                                                              32
                                                                                                                                                                   64
                                                                                                                                                                        128
                                                                                                                                                                              256
                                                                                                                                                                                    512
                                                                                                                                                                                          1024
                                                                                                                                                                                                  2048
                                                                                                                                                                                                          4096
                                                                                                                                                                                                                  8192
                                                                                                                                                                                                                          16384
                                                                                                                                                                                                                                   32768
                                                                                                                                                                                                                                            65536
                                                                                                                                                                                                                                                     131072
                                                                                                                                                                   Write I/O Size Bucket

           0e+00
Datanode)–)Actual)vs.)Logical)I/O)Size)
           5e+08

                                                                                                                                  5e+08

1.5e+09


           4e+08                                                                                                                  4e+08




1.0e+09                                                                                                                           3e+08




                                                                                                                               Bytes
Bytes




           3e+08
        Bytes




                                                                                                                                  2e+08

5.0e+08

                                                                                                                                  1e+08

           2e+08
0.0e+00                                                                                                                           0e+00           1
                                                                                                                                                  2
                                                                                                                                                  4
                                                                                                                                                  8
                                                                                                                                                 16
                                                                                                                                                 32
                                                                                                                                                 64
                                                                                                                                                128
                                                                                                                                                256
                                                                                                                                                512
                                                                                                                                               1024
                                                                                                                                               2048
                                                                                                                                               4096
                                                                                                                                               8192
                                                                                                                                              16384
                                                                                                                                              32768
                                                                                                                                              65536
                                                                                                                                             131072
                                                                                                                                             262144
                                                                                                                                             524288
                                                                                                                                            1048576
                                                                                                                                            2097152
                                                                                                                                            4194304
                                                                                                                                            8388608
                                                                                                                                           16777216
                                                                                                                                           33554432
                                                                                                                                           67108864
                                                                                                                                          134217728
                 1
                     2
                         4
                             8
                                 16
                                      32
                                           64
                                                128
                                                      256
                                                            512
                                                                  1024
                                                                         2048
                                                                                4096
                                                                                       8192
                                                                                              16384
                                                                                                      32768
                                                                                                              65536
                                                                                                                      131072




                                                I/O Size Bucket                                                                                         I/O Size Bucket

           1e+08
                     I/O)measured)at)syscall)                                                                                             Logical)I/O)(sequenFal)grouping)of)syscalls))


           0e+00
Number of I/Os




                                                               0
                                                                                    5000
                                                                                                               10000
                                                                                                                                                        15000
                                                                                                                                                                           20000
                                                                                                                                                                                                       25000
                                                                                                                               Bytes




0e+00
                                                                   1e+08
                                                                                                       2e+08
                                                                                                                                                            3e+08
                                                                                                                                                                                      4e+08
                                                                                                                                                                                                                    5e+08
                                                          1
                                                          2
                                                          4
                                                          8
                                                          16
                                                          32
                                                          64
                                                      128
                                                      256
                                                      512
                                                      1024




                                I/O Size Bucket
                                                      2048
                                                      4096
                                                      8192
                                            16384
                                            32768
                                            65536
                                   131072


                                                                              Number of I/Os                                                                                          Number of I/Os




                                                      0
                                                                       5000
                                                                                               10000
                                                                                                                       15000
                                                                                                                                                               0
                                                                                                                                                                    2000
                                                                                                                                                                               4000
                                                                                                                                                                                              6000
                                                                                                                                                                                                          8000
                                                                                                                                                                                                                 10000




                                                  1                                                                                                       1
                                                  2                                                                                                       2
                                                  4                                                                                                       4
                                                  8                                                                                                       8
                                                                                                                                                                                                                            Datanode)–)IOPS)




                                          16                                                                                                             16
                                          32                                                                                                             32
                                          64                                                                                                             64
                                      128                                                                                                              128
                                      256                                                                                                              256
                                      512                                                                                                              512
                                 1024                                                                                                                  1024
        Write I/O Size Bucket
                                                                                                                                Read I/O Size Bucket




                                 2048                                                                                                                  2048
                                 4096                                                                                                                  4096
                                 8192                                                                                                                  8192
                         16384                                                                                                                  16384
                         32768                                                                                                                  32768
                         65536                                                                                                                  65536
             131072                                                                                                                  131072
Back)of)the)Envelope)Modeling)))
•  How)much)bandwidth)does)terasort)need?)
    –  10)seconds)of)CPU/core)Fme)per)task)
    –  128MB)of)HDFS)per)task)
    –  ~3x,)384MB)of)temporary)data)per)task)


I/O&Component& Per7task& Per7task&Bandwidth&   Per7host&(24&
                                               cores)&
HDFS)I/O)      128MB)    ~13MBytes/s)          312Mbytes/sec)
Temp)          384MB)    ~38Mbytes/sec)        912Mbytes/sec)
Do)we)need)locality?)
•  Main)issue)is)crossNsecFonal)bandwidth)
    –  Secondary)issue)is)perNhost)link)speed)
    –  Just)look)at)storage)I/O)now,)consider)shuffle)next)
 I/O&          Per7host&(24&    Network&          Rack&
 Component&    cores)&          Bandwidth&&       Bandwidth&
                                w/&0%&locality&   w/40&hosts&
 HDFS)I/O)     312Mbytes/sec)   2.5Gbits)         100gbits)
 Temp)         912Mbytes/sec)   7.3Gbits)         300gbits)

•  Possible)Conclusion)
   –  Must)have)locality)w/1Gbit)host)link)
   –  Feasible)to)have)remote)data)w/10Gbit)and)keeping)
      temp)local)only)

Contenu connexe

Tendances

Chapter 3 cloud computing and intro parrallel computing
Chapter 3 cloud computing and intro parrallel computingChapter 3 cloud computing and intro parrallel computing
Chapter 3 cloud computing and intro parrallel computingPraveen M Jigajinni
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Introduction to HADOOP.pdf
Introduction to HADOOP.pdfIntroduction to HADOOP.pdf
Introduction to HADOOP.pdf8840VinayShelke
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsEMC
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
 
31 address binding, dynamic loading
31 address binding, dynamic loading31 address binding, dynamic loading
31 address binding, dynamic loadingmyrajendra
 

Tendances (20)

Chapter 3 cloud computing and intro parrallel computing
Chapter 3 cloud computing and intro parrallel computingChapter 3 cloud computing and intro parrallel computing
Chapter 3 cloud computing and intro parrallel computing
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Introduction to HADOOP.pdf
Introduction to HADOOP.pdfIntroduction to HADOOP.pdf
Introduction to HADOOP.pdf
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Data integration
Data integrationData integration
Data integration
 
Temporal database
Temporal databaseTemporal database
Temporal database
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
31 address binding, dynamic loading
31 address binding, dynamic loading31 address binding, dynamic loading
31 address binding, dynamic loading
 

En vedette

Chapter 8 big data and privacy
Chapter 8 big data and privacyChapter 8 big data and privacy
Chapter 8 big data and privacyopeyemiatilola1992
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Steve Loughran
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Richard McDougall
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataRichard McDougall
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandRichard McDougall
 
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)오석 한
 
Hw09 Cloudera Desktop In Detail
Hw09   Cloudera Desktop In DetailHw09   Cloudera Desktop In Detail
Hw09 Cloudera Desktop In DetailCloudera, Inc.
 
The Future of Data
The Future of DataThe Future of Data
The Future of Datablynnbuckley
 
Spark tuning2016may11bida
Spark tuning2016may11bidaSpark tuning2016may11bida
Spark tuning2016may11bidaAnya Bida
 
Cloudera introduction
Cloudera introductionCloudera introduction
Cloudera introductionPhate334
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java DevelopersRichard McDougall
 
Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009Richard McDougall
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environmentsDataWorks Summit
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN AppsCloudera, Inc.
 

En vedette (20)

451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
Chapter 8 big data and privacy
Chapter 8 big data and privacyChapter 8 big data and privacy
Chapter 8 big data and privacy
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big Data
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
 
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)
 
Hadoop on VMware
Hadoop on VMwareHadoop on VMware
Hadoop on VMware
 
Hw09 Cloudera Desktop In Detail
Hw09   Cloudera Desktop In DetailHw09   Cloudera Desktop In Detail
Hw09 Cloudera Desktop In Detail
 
The Future of Data
The Future of DataThe Future of Data
The Future of Data
 
Making of the Burner Board
Making of the Burner BoardMaking of the Burner Board
Making of the Burner Board
 
Hadoop on Virtual Machines
Hadoop on Virtual MachinesHadoop on Virtual Machines
Hadoop on Virtual Machines
 
Spark tuning2016may11bida
Spark tuning2016may11bidaSpark tuning2016may11bida
Spark tuning2016may11bida
 
Cloudera introduction
Cloudera introductionCloudera introduction
Cloudera introduction
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java Developers
 
Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
 

Similaire à Hadoop I/O analysis using vProbes

Scala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkScala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkIvan Morozov
 
Big Data @ Orange - Dev Day 2013 - part 2
Big Data @ Orange - Dev Day 2013 - part 2Big Data @ Orange - Dev Day 2013 - part 2
Big Data @ Orange - Dev Day 2013 - part 2ovarene
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in Rmickey24
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansattilacsordas
 
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...confluent
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkZalando Technology
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"Giivee The
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 
CL metaprogramming
CL metaprogrammingCL metaprogramming
CL metaprogrammingdudarev
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and RRadek Maciaszek
 
Crunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-casesCrunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-casesSergii Khomenko
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkPatrick Wendell
 
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享elevenma
 

Similaire à Hadoop I/O analysis using vProbes (20)

Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
Scala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkScala Meetup Hamburg - Spark
Scala Meetup Hamburg - Spark
 
Big Data @ Orange - Dev Day 2013 - part 2
Big Data @ Orange - Dev Day 2013 - part 2Big Data @ Orange - Dev Day 2013 - part 2
Big Data @ Orange - Dev Day 2013 - part 2
 
R meets Hadoop
R meets HadoopR meets Hadoop
R meets Hadoop
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
CL metaprogramming
CL metaprogrammingCL metaprogramming
CL metaprogramming
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
 
Lecture12
Lecture12Lecture12
Lecture12
 
Crunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-casesCrunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-cases
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
 

Dernier

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Dernier (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Hadoop I/O analysis using vProbes

  • 1. Architects)view)of)Hadoop)I/O) I/O)analysis)using)vProbes) ) Richard)McDougall) V1.0)) April)2012)
  • 2. Architect’s)QuesFons) •  Does)Hadoop)really)need)compute)+)data) local) •  How)much)and)what)I/O)rates)of)ephemeral) data)do)we)need)to)design)for?) •  What)I/O)paKerns)do)we)need)to)support) HDFS?) •  What)is)the)I/O)paKern)of)MNR)tasks) •  Are)there)opportuniFes)for)caching)–)map) input,)output)or)ephemeral?)
  • 3. Controlled)Small)Study) •  Focus)on)developing)tooling) •  Using)vProbes)+)Perl)+)R) •  Hadoop)0.20.204) •  Terasort)@)1GB) •  One)Namenode,)Tasktracker,)Datanode)
  • 4. Terasort) Map)Task) Map)Task) Reduce) Output)File) Input)File) Shuffle) (Sort)) Map)Task) Map)Task) Input) Splits) Sort)Chunk)of) Shuffle)output) Combine)and)Sort) (x16)) Of)KeyNValues) To)Reducers)
  • 5. Log)of)the)sort)‘Job’) $ log.pl job_201201261301_0005_1327649126255_rmc_TeraSort ! Item Time Jobname Taskname Phase Start-Time End-Time Elapsed ! Job 0.000 201201261301_0005 ! Job 201201261301_0005 ! Job 0.475 201201261301_0005 PREP ! Task 1.932 201201261301_0005 m_000017 SETUP ! MapAttempt 3.066 201201261301_0005 m_000017 SETUP ! MapAttempt 10.409 201201261301_0005 m_000017 SETUP SUCCESS 1.932 10.409 8.477 "setup"! Task 10.966 201201261301_0005 m_000017 SETUP SUCCESS 1.932 10.966 9.034 ! Job 201201261301_0005 RUNNING ! Task 10.970 201201261301_0005 m_000000 MAP ! Task 10.972 201201261301_0005 m_000001 MAP ! MapAttempt 10.981 201201261301_0005 m_000000 MAP ! MapAttempt 65.819 201201261301_0005 m_000000 MAP SUCCESS 10.970 65.819 54.849 ""! Task 68.063 201201261301_0005 m_000000 MAP SUCCESS 10.970 68.063 57.093 ! MapAttempt 10.998 201201261301_0005 m_000001 MAP ! MapAttempt 65.363 201201261301_0005 m_000001 MAP SUCCESS 10.972 65.363 54.391 ""! Task 68.065 201201261301_0005 m_000001 MAP SUCCESS 10.972 68.065 57.093 ! Task 68.066 201201261301_0005 m_000002 MAP ! Task 68.067 201201261301_0005 m_000003 MAP ! Task 68.068 201201261301_0005 r_000000 REDUCE ! MapAttempt 68.075 201201261301_0005 m_000002 MAP ! MapAttempt 139.789 201201261301_0005 m_000002 MAP SUCCESS 68.066 139.789 71.723 ""! Task 140.193 201201261301_0005 m_000002 MAP SUCCESS 68.066 140.193 72.127 ! MapAttempt 68.076 201201261301_0005 m_000003 MAP ! MapAttempt 139.927 201201261301_0005 m_000003 MAP SUCCESS 68.067 139.927 71.860 ""! Task 140.198 201201261301_0005 m_000003 MAP SUCCESS 68.067 140.198 72.131 ! …! ReduceAttempt 68.112 201201261301_0005 r_000000 REDUCE ! ReduceAttempt 795.299 201201261301_0005 r_000000 REDUCE SUCCESS 68.068 795.299 727.231 "reduce > reduce"! Task 798.223 201201261301_0005 r_000000 REDUCE SUCCESS 68.068 798.223 730.155 ! Task 798.226 201201261301_0005 m_000016 CLEANUP ! MapAttempt 798.241 201201261301_0005 m_000016 CLEANUP ! MapAttempt 806.113 201201261301_0005 m_000016 CLEANUP SUCCESS 798.226 806.113 7.887 "cleanup"! Task 807.252 201201261301_0005 m_000016 CLEANUP SUCCESS 798.226 807.252 9.026 ! Job 807.253 201201261301_0005 SUCCESS 0.000 807.253 807.253 !
  • 6. Terasort:)Map)and)Reduce)Phases) Setup)Map) Elapsed)Time)N)Seconds) Mappers) Reducer) Cleanup)Map)
  • 7. Terasort:)Map)and)Reduce)Phases) Setup)Map) Elapsed)Time)N)Seconds) Zoom)in) on) Map)Task) I/O) Mappers) Zoom)in) on) Reduce) Task)I/O) Reducer) Cleanup)Map)
  • 8. VMware)vProbes) •  Dynamic) InstrumentaFon) •  Probe)mulFple) VMs) •  Probe) VirtualizaFon) Layer) •  VMware)Fusion) and)WorkstaFon)
  • 9. vProbes) GUEST:ENTER:system_call {! string path;! comm = curprocname();! tid = curtid();! pid = curpid();! ppid = curppid();! syscall_num = sysnum;! ! if(syscall_num == NR_open) {! !path = guestloadstr(sys_arg0);! syscall_name = "open";! sprintf(syscall_args, ""%s", %x, %x", path, sys_arg1, sys_arg2); ! …! }! ! GUEST:OFFSET:ret_from_sys_call:0 {! !printf("%s/%d/%d/%d %s(%s) = %d <0>n", comm, pid, rtid, ppid, syscall_name,! syscall_args, getgpr(REG_RAX)); ! }! ! ! java/14774/15467/1 open("/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta", 0, 1b6) = 144 <0>! java/14774/15467/1 stat("/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta", 7f0b80a4e590) = 0 <0>! java/14774/15467/1 read(144, 7f0b80a4c470, 4096) = 167 <0>! !
  • 10. Pathname)ResoluFon) filetracevp.pl: ! ! if ($syscall =~ m/open/) {! $path1 = $line;! $path1 =~ s/[A-z/0-9]+[ ]+[a-z]+("([^"]+)".*n/1/;! $fd1 = $line;! if ($fd1 =~ s/.* ([0-9]+) <.*>n/1/) {! $fds{$pid,$fd1} = $path1;! ! if ($syscall =~ m/write/) {! $params = $line;! if ($params =~ s/^[A-z/0-9]+[ ]+[a-z]+(([0-9]+),.* ([0-9]+)) = ([0-9]+) <(.*)>n/1,2,3,4/) {! ($fd1, $size, $bytes, $lat) = split(',', $params);! $path1 = $fds{$pid, $fd1};! …! ! ! java,14774,15467,,open,0,0,0,0,144,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,! java,14774,15467,,stat,0,0,0,0,0,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,! java,14774,15467,,read,4096,167,0,0,144,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,! ! ! ! !
  • 11. Controlled)SmallNScale)Study) $ hadoop jar hadoop-examples-0.20.204.0.jar teragen 10000000 teradata! <begin trace>! $ hadoop jar hadoop-examples-0.20.204.0.jar terasort teradata teraout! ! Job Counters ! Hadoop)Distro) 236) Launched reduce tasks=1! Hadoop)Logs) 132) SLOTS_MILLIS_MAPS=1146887! Hadoop)clienKmp)unjar) 1) Launched map tasks=16! Data-local map tasks=16! Mappers)files)jobcache)N)spills) 1753) SLOTS_MILLIS_REDUCES=766823! Mappers)files)jobcache)N)output) 1777) File Input Format Counters ! Reducer)Intermediate) 764) Bytes Read=1000057358! Reducers)Shuffle)and)Intermediate) 1744) File Output Format Counters ! Bytes Written=1000000000! Jobcache)class)files)and)shell)scripts) 1) FileSystemCounters! Hadoop)Datanode) 1690) FILE_BYTES_READ=2382257412! JVM)N)/usr/lib/jvm…) 98) HDFS_BYTES_READ=1000059070! Total&MB& 7987& FILE_BYTES_WRITTEN=3402627838! HDFS_BYTES_WRITTEN=1000000000! JVM)N)/usr/lib/jvm…) Map-Reduce Framework! Map output materialized bytes=1020000096! Hadoop)Datanode) Map input records=10000000! Jobcache)class)files)and)shell)scripts) Reduce shuffle bytes=1020000096! Spilled Records=33355441! Reducers)files)jobcache)N)output) Map output bytes=1000000000! Reducer)intermediate)file) Map input bytes=1000000000! Combine input records=0! Mappers)files)jobcache)N)map)output) SPLIT_RAW_BYTES=1712! Mappers)files)jobcache)N)spills) Reduce input records=10000000! Reduce input groups=10000000! Hadoop)clienKmp)unjar) Combine output records=0! Hadoop)Logs) Reduce output records=10000000! Hadoop)Distro) Map output records=10000000! 0) 200) 400) 600) 800)1000)1200)1400)1600)1800)2000)
  • 12. Hadoop)I/O)Model) (With)some)data)from)early)observaFons)) Map)Task) Reduce) Map)Task) Job) Map) Reduce) Sort) Map)Task) Output) file.out* Spills) Map)Task) DFS) Spills) &)Logs) ) Shuffle) Map_*.out* Input) Data) spill*.out* 75%)of) Combine) DFS) Intermediate.out* Output) ) Disk)Bandwidth) ) Data) 12%)of) 12%)of) Bandwidth) Bandwidth) HDFS) 12)
  • 13. One)Mapper)Task:)Temp)Data) path bytes /host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/file.out 67586124 /host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill1.out 52762519 /host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill0.out 52508540 /host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill2.out 29698564 /usr/lib/jvm/javaD6Dopenjdk/jre/lib/rt.jar 5057763 /home/rmc/untars/hadoopD0.20.204.0/hadoopDcoreD0.20.204.0.jar 895582 /home/rmc/untars/hadoopD0.20.204.0/lib/log4jD1.2.15.jar 82522 /home/rmc/untars/hadoopD0.20.204.0/lib/commonsDlangD2.4.jar 70477 /home/rmc/untars/hadoopD0.20.204.0/lib/commonsDconfigurationD1.6.jar 61007 /usr/lib/x86_64DlinuxDgnu/gconv/gconvDmodules 51772 /host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/job.xml 44420 /home/rmc/untars/hadoopD0.20.204.0/lib/commonsDcollectionsD3.2.1.jar 29974 /host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/job.xml 21695 /usr/lib/jvm/javaD6Dopenjdk/jre/lib/amd64/libnio.so 15946 /home/rmc/untars/hadoopD0.20.204.0/conf/coreDsite.xml 11024 /usr/lib/jvm/javaD6Dopenjdk/jre/lib/security/java.security 10081 /proc/self/maps 7523
  • 14. Number of I/Os 0 10000 20000 30000 40000 50000 60000 1 2 4 8 16 32 64 128 256 512 1024 I/O Size Bucket 2048 4096 8192 16384 32768 65536 131072 I/O)measured)at)syscall) Number of I/Os Number of I/Os 0 5000 10000 15000 20000 25000 30000 0 5000 10000 15000 20000 25000 30000 1 1 2 2 4 4 8 8 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024 Write I/O Size Bucket 2048 Read I/O Size Bucket 2048 4096 4096 8192 8192 16384 16384 One)Mapper)Task:)Temp)I/O)Counts) 32768 32768 65536 131072 65536 131072
  • 15. One)Mapper)Task:)Tmp)Bytes)Transferred) 2.5e+08 6e+07 2.0e+08 5e+07 4e+07 1.5e+08 Bytes Bytes 3e+07 1.0e+08 2e+07 5.0e+07 1e+07 0.0e+00 0e+00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864 134217728 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 I/O Size Bucket I/O Size Bucket I/O)measured)at)syscall) Logical)I/O)(sequenFal)grouping)of)syscalls))
  • 17. Number of I/Os 0e+00 1e+05 2e+05 3e+05 4e+05 1 2 4 8 16 32 64 128 256 512 1024 I/O Size Bucket 2048 4096 8192 16384 32768 65536 131072 I/O)measured)at)syscall) Number of I/Os Number of I/Os 0 50000 100000 150000 200000 250000 300000 0 20000 40000 60000 80000 1 1 2 2 4 4 8 8 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024 Reducer)Task:)Temp)I/O)Counts) Read I/O Size Bucket 2048 Write I/O Size Bucket 2048 4096 4096 8192 8192 16384 16384 32768 32768 65536 65536 131072 131072
  • 18. Reducer)Task:)Tmp)Bytes)Transferred) 1.5e+09 5e+08 4e+08 1.0e+09 Bytes 3e+08 Bytes 2e+08 5.0e+08 1e+08 0.0e+00 0e+00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 I/O Size Bucket I/O Size Bucket I/O)measured)at)syscall) Logical)I/O)(sequenFal)grouping)of)syscalls))
  • 19. Datanode)–)Bytes)Transferred) 5e+08 7e+08 6e+08 5e+08 1.5e+09 4e+08 Bytes 4e+08 3e+08 2e+08 1e+08 1.0e+09 Bytes 3e+08 Bytes 0e+00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 Read I/O Size Bucket 1e+09 5.0e+08 8e+08 2e+08 6e+08 Bytes 4e+08 1e+08 0.0e+00 2e+08 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 0e+00 I/O Size Bucket 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 Write I/O Size Bucket 0e+00
  • 20. Datanode)–)Actual)vs.)Logical)I/O)Size) 5e+08 5e+08 1.5e+09 4e+08 4e+08 1.0e+09 3e+08 Bytes Bytes 3e+08 Bytes 2e+08 5.0e+08 1e+08 2e+08 0.0e+00 0e+00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864 134217728 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 I/O Size Bucket I/O Size Bucket 1e+08 I/O)measured)at)syscall) Logical)I/O)(sequenFal)grouping)of)syscalls)) 0e+00
  • 21. Number of I/Os 0 5000 10000 15000 20000 25000 Bytes 0e+00 1e+08 2e+08 3e+08 4e+08 5e+08 1 2 4 8 16 32 64 128 256 512 1024 I/O Size Bucket 2048 4096 8192 16384 32768 65536 131072 Number of I/Os Number of I/Os 0 5000 10000 15000 0 2000 4000 6000 8000 10000 1 1 2 2 4 4 8 8 Datanode)–)IOPS) 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024 Write I/O Size Bucket Read I/O Size Bucket 2048 2048 4096 4096 8192 8192 16384 16384 32768 32768 65536 65536 131072 131072
  • 22. Back)of)the)Envelope)Modeling))) •  How)much)bandwidth)does)terasort)need?) –  10)seconds)of)CPU/core)Fme)per)task) –  128MB)of)HDFS)per)task) –  ~3x,)384MB)of)temporary)data)per)task) I/O&Component& Per7task& Per7task&Bandwidth& Per7host&(24& cores)& HDFS)I/O) 128MB) ~13MBytes/s) 312Mbytes/sec) Temp) 384MB) ~38Mbytes/sec) 912Mbytes/sec)
  • 23. Do)we)need)locality?) •  Main)issue)is)crossNsecFonal)bandwidth) –  Secondary)issue)is)perNhost)link)speed) –  Just)look)at)storage)I/O)now,)consider)shuffle)next) I/O& Per7host&(24& Network& Rack& Component& cores)& Bandwidth&& Bandwidth& w/&0%&locality& w/40&hosts& HDFS)I/O) 312Mbytes/sec) 2.5Gbits) 100gbits) Temp) 912Mbytes/sec) 7.3Gbits) 300gbits) •  Possible)Conclusion) –  Must)have)locality)w/1Gbit)host)link) –  Feasible)to)have)remote)data)w/10Gbit)and)keeping) temp)local)only)