3. GCGGCATAAATTGGATAAAGAAAGAACTGAAGGACA
>100000
GTTACCATGTATTGTGACAGATAACCACGGTGGAGT
[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -mkdir /data/cloudburst
[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -put ../CloudBurst-1.1.0/s_suis.br /data/cloudburst
[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -put ../CloudBurst-1.1.0/100k.br /data/cloudburst
[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop jar ../CloudBurst-1.1.0/CloudBurst.jar
Usage: CloudBurst refpath qrypath outpath minreadlen maxreadlen k allowdifferences filteralignments #mappers #reduces
#fmappers #freducers blocksize redundancy
1. refpath: path in hdfs to the reference file
2. qrypath: path in hdfs to the query file
3. outpath: path to a directory to store the results (old results are automatically deleted)
4. minreadlen: minimum length of the reads
5. maxreadlen: maximum read length
6. k: number of mismatches / differences to allow (higher number requires more time)
7. allowdifferences: 0: mismatches only, 1: indels as well
8. filteralignments: 0: all alignments, 1: only report unambiguous best alignment (results identical to RMAP)
9. #mappers: number of mappers to use. suggested: #processor-cores * 10
10. #reduces: number of reducers to use. suggested: #processor-cores * 2
11. #fmappers: number of mappers for filtration alg. suggested: #processor-cores
12. #freducers: number of reducers for filtration alg. suggested: #processor-cores
13. blocksize: number of qry and ref tuples to consider at a time in the reduce phase. suggested: 128
14. redundancy: number of copies of low complexity seeds to use. suggested: # processor cores
[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop jar ../CloudBurst-1.1.0/CloudBurst.jar /data/cloudburst/s_suis.br
/data/cloudburst/100k.br /data/results 36 36 3 0 1 240 48 24 24 128 16 >& cloudburst.err
[hadoop@skcc-nebdap02 hadoop]$ cat cloudburst.err
refath: /data/cloudburst/s_suis.br
qrypath: /data/cloudburst/100k.br
outpath: /data/results-alignments
MIN_READ_LEN: 36
MAX_READ_LEN: 36
K: 3
SEED_LEN: 9
FLANK_LEN: 30
ALLOW_DIFFERENCES: 0
FILTER_ALIGNMENTS: true
NUM_MAP_TASKS: 240
4. NUM_REDUCE_TASKS: 48
BLOCK_SIZE: 128
REDUNDANCY: 16
Removing old results
12/06/15 17:11:27 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
12/06/15 17:11:28 INFO mapred.FileInputFormat: Total input paths to process : 2
12/06/15 17:11:28 INFO mapred.JobClient: Running job: job_201206112243_0018
12/06/15 17:11:29 INFO mapred.JobClient: map 0% reduce 0%
12/06/15 17:11:47 INFO mapred.JobClient: map 12% reduce 0%
12/06/15 17:11:48 INFO mapred.JobClient: map 14% reduce 0%
12/06/15 17:11:49 INFO mapred.JobClient: map 15% reduce 0%
12/06/15 17:11:50 INFO mapred.JobClient: map 17% reduce 0%
12/06/15 17:11:51 INFO mapred.JobClient: map 19% reduce 0%
12/06/15 17:11:52 INFO mapred.JobClient: map 21% reduce 0%
12/06/15 17:11:53 INFO mapred.JobClient: map 36% reduce 0%
12/06/15 17:11:54 INFO mapred.JobClient: map 40% reduce 0%
12/06/15 17:11:55 INFO mapred.JobClient: map 45% reduce 0%
12/06/15 17:11:56 INFO mapred.JobClient: map 49% reduce 0%
12/06/15 17:11:57 INFO mapred.JobClient: map 56% reduce 0%
12/06/15 17:11:58 INFO mapred.JobClient: map 57% reduce 0%
12/06/15 17:11:59 INFO mapred.JobClient: map 74% reduce 0%
12/06/15 17:12:00 INFO mapred.JobClient: map 80% reduce 1%
12/06/15 17:12:01 INFO mapred.JobClient: map 80% reduce 2%
12/06/15 17:12:02 INFO mapred.JobClient: map 83% reduce 3%
12/06/15 17:12:03 INFO mapred.JobClient: map 91% reduce 4%
12/06/15 17:12:05 INFO mapred.JobClient: map 95% reduce 6%
12/06/15 17:12:06 INFO mapred.JobClient: map 95% reduce 9%
12/06/15 17:12:07 INFO mapred.JobClient: map 95% reduce 10%
12/06/15 17:12:08 INFO mapred.JobClient: map 100% reduce 14%
12/06/15 17:12:09 INFO mapred.JobClient: map 100% reduce 17%
12/06/15 17:12:10 INFO mapred.JobClient: map 100% reduce 18%
12/06/15 17:12:11 INFO mapred.JobClient: map 100% reduce 22%
12/06/15 17:12:13 INFO mapred.JobClient: map 100% reduce 23%
12/06/15 17:12:14 INFO mapred.JobClient: map 100% reduce 28%
12/06/15 17:12:15 INFO mapred.JobClient: map 100% reduce 31%
12/06/15 17:12:17 INFO mapred.JobClient: map 100% reduce 51%
12/06/15 17:12:18 INFO mapred.JobClient: map 100% reduce 65%
5. 12/06/15 17:12:19 INFO mapred.JobClient: map 100% reduce 70%
12/06/15 17:12:20 INFO mapred.JobClient: map 100% reduce 87%
12/06/15 17:12:21 INFO mapred.JobClient: map 100% reduce 92%
12/06/15 17:12:22 INFO mapred.JobClient: map 100% reduce 94%
12/06/15 17:12:23 INFO mapred.JobClient: map 100% reduce 98%
12/06/15 17:12:26 INFO mapred.JobClient: map 100% reduce 100%
12/06/15 17:12:31 INFO mapred.JobClient: Job complete: job_201206112243_0018
12/06/15 17:12:32 INFO mapred.JobClient: Counters: 31
12/06/15 17:12:32 INFO mapred.JobClient: Job Counters
12/06/15 17:12:32 INFO mapred.JobClient: Launched reduce tasks=48
12/06/15 17:12:32 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=2980992
12/06/15 17:12:32 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/15 17:12:32 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/06/15 17:12:32 INFO mapred.JobClient: Rack-local map tasks=158
12/06/15 17:12:32 INFO mapred.JobClient: Launched map tasks=241
12/06/15 17:12:32 INFO mapred.JobClient: Data-local map tasks=83
12/06/15 17:12:32 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=1106915
12/06/15 17:12:32 INFO mapred.JobClient: File Input Format Counters
12/06/15 17:12:32 INFO mapred.JobClient: Bytes Read=5587101
12/06/15 17:12:32 INFO mapred.JobClient: File Output Format Counters
12/06/15 17:12:32 INFO mapred.JobClient: Bytes Written=2707836
12/06/15 17:12:32 INFO mapred.JobClient: FileSystemCounters
12/06/15 17:12:32 INFO mapred.JobClient: FILE_BYTES_READ=140515797
12/06/15 17:12:32 INFO mapred.JobClient: HDFS_BYTES_READ=6112267
12/06/15 17:12:32 INFO mapred.JobClient: FILE_BYTES_WRITTEN=288167030
12/06/15 17:12:32 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2707836
12/06/15 17:12:32 INFO mapred.JobClient: Map-Reduce Framework
12/06/15 17:12:32 INFO mapred.JobClient: Map output materialized bytes=140584917
12/06/15 17:12:32 INFO mapred.JobClient: Map input records=100032
12/06/15 17:12:32 INFO mapred.JobClient: Reduce shuffle bytes=140436273
12/06/15 17:12:32 INFO mapred.JobClient: Spilled Records=5558658
12/06/15 17:12:32 INFO mapred.JobClient: Map output bytes=134956851
12/06/15 17:12:32 INFO mapred.JobClient: Total committed heap usage (bytes)=57936314368
12/06/15 17:12:32 INFO mapred.JobClient: CPU time spent (ms)=1693370
12/06/15 17:12:32 INFO mapred.JobClient: Map input bytes=5073092
12/06/15 17:12:32 INFO mapred.JobClient: SPLIT_RAW_BYTES=24638
12/06/15 17:12:32 INFO mapred.JobClient: Combine input records=0
12/06/15 17:12:32 INFO mapred.JobClient: Reduce input records=2774585
6. 12/06/15 17:12:32 INFO mapred.JobClient: Reduce input groups=254196
12/06/15 17:12:32 INFO mapred.JobClient: Combine output records=0
12/06/15 17:12:32 INFO mapred.JobClient: Physical memory (bytes) snapshot=57459982336
12/06/15 17:12:32 INFO mapred.JobClient: Reduce output records=81128
12/06/15 17:12:32 INFO mapred.JobClient: Virtual memory (bytes) snapshot=754874736640
12/06/15 17:12:32 INFO mapred.JobClient: Map output records=2779329
CloudBurst Finished
Alignment time: 65.36
NUM_FMAP_TASKS: 24
NUM_FREDUCE_TASKS: 24
Removing old results
12/06/15 17:12:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
12/06/15 17:12:32 INFO mapred.FileInputFormat: Total input paths to process : 48
12/06/15 17:12:39 INFO mapred.JobClient: Running job: job_201206112243_0019
12/06/15 17:12:40 INFO mapred.JobClient: map 0% reduce 0%
12/06/15 17:12:54 INFO mapred.JobClient: map 62% reduce 0%
12/06/15 17:12:55 INFO mapred.JobClient: map 100% reduce 0%
12/06/15 17:13:06 INFO mapred.JobClient: map 100% reduce 16%
12/06/15 17:13:07 INFO mapred.JobClient: map 100% reduce 33%
12/06/15 17:13:09 INFO mapred.JobClient: map 100% reduce 58%
12/06/15 17:13:10 INFO mapred.JobClient: map 100% reduce 75%
12/06/15 17:13:12 INFO mapred.JobClient: map 100% reduce 87%
12/06/15 17:13:13 INFO mapred.JobClient: map 100% reduce 91%
12/06/15 17:13:15 INFO mapred.JobClient: map 100% reduce 100%
12/06/15 17:13:20 INFO mapred.JobClient: Job complete: job_201206112243_0019
12/06/15 17:13:20 INFO mapred.JobClient: Counters: 31
12/06/15 17:13:20 INFO mapred.JobClient: Job Counters
12/06/15 17:13:20 INFO mapred.JobClient: Launched reduce tasks=24
12/06/15 17:13:20 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=207232
12/06/15 17:13:20 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/15 17:13:20 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/06/15 17:13:20 INFO mapred.JobClient: Rack-local map tasks=5
12/06/15 17:13:20 INFO mapred.JobClient: Launched map tasks=48
12/06/15 17:13:20 INFO mapred.JobClient: Data-local map tasks=43
12/06/15 17:13:20 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=245651
12/06/15 17:13:20 INFO mapred.JobClient: File Input Format Counters
12/06/15 17:13:20 INFO mapred.JobClient: Bytes Read=2707836
7. 12/06/15 17:13:20 INFO mapred.JobClient: File Output Format Counters
12/06/15 17:13:20 INFO mapred.JobClient: Bytes Written=2485042
12/06/15 17:13:20 INFO mapred.JobClient: FileSystemCounters
12/06/15 17:13:20 INFO mapred.JobClient: FILE_BYTES_READ=2188332
12/06/15 17:13:20 INFO mapred.JobClient: HDFS_BYTES_READ=2713260
12/06/15 17:13:20 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6039532
12/06/15 17:13:20 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2485042
12/06/15 17:13:20 INFO mapred.JobClient: Map-Reduce Framework
12/06/15 17:13:20 INFO mapred.JobClient: Map output materialized bytes=2195100
12/06/15 17:13:20 INFO mapred.JobClient: Map input records=81128
12/06/15 17:13:20 INFO mapred.JobClient: Reduce shuffle bytes=2153793
12/06/15 17:13:20 INFO mapred.JobClient: Spilled Records=162088
12/06/15 17:13:20 INFO mapred.JobClient: Map output bytes=2028200
12/06/15 17:13:20 INFO mapred.JobClient: Total committed heap usage (bytes)=14471921664
12/06/15 17:13:20 INFO mapred.JobClient: CPU time spent (ms)=95390
12/06/15 17:13:20 INFO mapred.JobClient: Map input bytes=2703324
12/06/15 17:13:20 INFO mapred.JobClient: SPLIT_RAW_BYTES=5424
12/06/15 17:13:20 INFO mapred.JobClient: Combine input records=81128
12/06/15 17:13:20 INFO mapred.JobClient: Reduce input records=81044
12/06/15 17:13:20 INFO mapred.JobClient: Reduce input groups=76511
12/06/15 17:13:20 INFO mapred.JobClient: Combine output records=81044
12/06/15 17:13:20 INFO mapred.JobClient: Physical memory (bytes) snapshot=13169172480
12/06/15 17:13:20 INFO mapred.JobClient: Reduce output records=74502
12/06/15 17:13:20 INFO mapred.JobClient: Virtual memory (bytes) snapshot=193761902592
12/06/15 17:13:20 INFO mapred.JobClient: Map output records=81128
FilterAlignments Finished
Filtering time: 48.481
Total Running time: 113.841
[hadoop@skcc-nebdap02 hadoop]$
[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -get /data/results/ ../CloudBurst-1.1.0/results
[hadoop@skcc-nebdap02 hadoop]$ cd ../CloudBurst-1.1.0
[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ java -jar PrintAlignments.jar results | sort -nk4 > 100k.3.txt
[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ head -n 20 100k.3.txt
1 766133 766169 1 1 +
1 297899 297935 2 0 -
1 1325118 1325154 4 1 +
1 145970 146006 7 1 -
1 553513 553549 8 0 -