SlideShare une entreprise Scribd logo
1  sur  9
참고 http://sourceforge.net/apps/mediawiki/cloudburst-bio/index.php?title=Sample_Results



[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ ll
total 24332
-rw-r--r-- 1 hadoop hadoop 1995984 Jun 15 17:19 100k.3.txt
-rw-r--r-- 1 hadoop hadoop 1995984 Dec 5 2008 100k.3.txt.gold
-rw-r--r-- 1 hadoop hadoop 4493593 Jun 15 17:06 100k.br
-rw-r--r-- 1 hadoop hadoop 4388895 Dec 5 2008 100k.fa
-rw-r--r-- 1 hadoop hadoop 1177790 Jun 15 17:06 100k.fa.map
-rw-r--r-- 1 hadoop hadoop    8337 Dec 5 2008 cloudburst.err.gold
-rw-r--r-- 1 hadoop hadoop   57014 Jul 9 2010 CloudBurst.jar
-rw-r--r-- 1 hadoop hadoop 4067962 Jul 9 2010 ConvertFastaForCloud.jar
-rw-r--r-- 1 hadoop hadoop 4067959 Jul 9 2010 PrintAlignments.jar
-rw-r--r-- 1 hadoop hadoop    1452 Jul 9 2010 README.txt
drwxr-xr-x 2 hadoop hadoop     4096 Jun 15 17:19 results
-rw-r--r-- 1 hadoop hadoop 579773 Jun 15 17:06 s_suis.br
-rw-r--r-- 1 hadoop hadoop 2040970 Dec 5 2008 s_suis.fa
-rw-r--r-- 1 hadoop hadoop     21 Jun 15 17:06 s_suis.fa.map
[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ cat README.txt
Sample data for CloudBurst
==========================


CloudBurst has several parameters to control the sensitivity of the
alignment algorithm. Here it finds the unambiguous best alignment for
100,000 reads allowing up to 3 mismatches when mapping to the corresponding
S. suis genome.




== Sample input data


s_suis.fa: Streptococcus suis reference genome sequence
100k.fa:   100,000 36bp Illumina reads available from
       http://www.sanger.ac.uk/Projects/S_suis/


== Format the input data
$ java -jar ConvertFastaForCloud.jar s_suis.fa s_suis.br
$ java -jar ConvertFastaForCloud.jar 100k.fa 100k.br
s_suis.br: reference genome in CloudBurst binary format
100k.br:   Reads in CloudBurst binary format


... 생략 ...


[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ head s_suis.fa
>Streptococcus_suis
atgaaccaagaacaacttttttggcaacgatttattgaattggcaaaggtaaattttaag
ccatctatttatgatttttatgtcgctgatgcaaaattactcggaatcaaccagcaagtt
gccaatattttcttaaatcgtccatttaaaaaagatttctgggaaaaaaacttcgaagag
ttaatgattgccgctagttttgaaagctacggagagcctcttaccatccaatatcaattt
... 생략 ...
acagaggatgaacaggagattaggaatactacaaacacaagaagttcaatagttcaccag
gtacagacacttgagccggctactcctcaagaaacttttaaaccggttcattctgatata
aaatcccagtacacctttgctaattttgtacaaggagacaataatcactgggcaaaggct
gcagctttagctgtatctgataacctaggtgagctctacaatccattattcatttttggt
ggtcctggtcttggaaaaactcatattttaaatgcgattggaaataaggttctagccgat
[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ wc -l s_suis.fa
33460 s_suis.fa
[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ head 100k.fa
>1
GCCTGTTCTTTACATGATTTTTGGTCTAGTGTATGG
>2
AACCGCTGTAAAGGCTTCTGCCACACCGATTTCTTG
>3
GAGGTGATTGTGGTATTGT.GGTAAATCGGTGATTG
>4
GCTTTAGCCGACCTGAACT.GACTACAAGTTGACCA
>5
AAAGGCTACCCGCGGTTGAACCTTACGTGACACATT
[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ tail 100k.fa
>99996
AATGCCCGTAACAACGGGCTTTTATCTTGTTCTAAA
>99997
GTCAGATAGCGCAGGAATTTCAAAGGAATTTGGACC
>99998
AGTTAACTCTTCAGCTGTAAAGTTGTAGTTTTCTAA
>99999
GCGGCATAAATTGGATAAAGAAAGAACTGAAGGACA
>100000
GTTACCATGTATTGTGACAGATAACCACGGTGGAGT
[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -mkdir /data/cloudburst
[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -put ../CloudBurst-1.1.0/s_suis.br /data/cloudburst
[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -put ../CloudBurst-1.1.0/100k.br /data/cloudburst
[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop jar ../CloudBurst-1.1.0/CloudBurst.jar
Usage: CloudBurst refpath qrypath outpath minreadlen maxreadlen k allowdifferences filteralignments #mappers #reduces
#fmappers #freducers blocksize redundancy


1. refpath:         path in hdfs to the reference file
2. qrypath:         path in hdfs to the query file
3. outpath:         path to a directory to store the results (old results are automatically deleted)
4. minreadlen:        minimum length of the reads
5. maxreadlen:         maximum read length
6. k:             number of mismatches / differences to allow (higher number requires more time)
7. allowdifferences: 0: mismatches only, 1: indels as well
8. filteralignments: 0: all alignments, 1: only report unambiguous best alignment (results identical to RMAP)
9. #mappers:          number of mappers to use.             suggested: #processor-cores * 10
10. #reduces:        number of reducers to use.            suggested: #processor-cores * 2
11. #fmappers:        number of mappers for filtration alg. suggested: #processor-cores
12. #freducers:       number of reducers for filtration alg. suggested: #processor-cores
13. blocksize:       number of qry and ref tuples to consider at a time in the reduce phase. suggested: 128
14. redundancy:        number of copies of low complexity seeds to use. suggested: # processor cores
[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop jar ../CloudBurst-1.1.0/CloudBurst.jar /data/cloudburst/s_suis.br
/data/cloudburst/100k.br /data/results 36 36 3 0 1 240 48 24 24 128 16 >& cloudburst.err
[hadoop@skcc-nebdap02 hadoop]$ cat cloudburst.err
refath: /data/cloudburst/s_suis.br
qrypath: /data/cloudburst/100k.br
outpath: /data/results-alignments
MIN_READ_LEN: 36
MAX_READ_LEN: 36
K: 3
SEED_LEN: 9
FLANK_LEN: 30
ALLOW_DIFFERENCES: 0
FILTER_ALIGNMENTS: true
NUM_MAP_TASKS: 240
NUM_REDUCE_TASKS: 48
BLOCK_SIZE: 128
REDUNDANCY: 16
 Removing old results
12/06/15 17:11:27 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
12/06/15 17:11:28 INFO mapred.FileInputFormat: Total input paths to process : 2
12/06/15 17:11:28 INFO mapred.JobClient: Running job: job_201206112243_0018
12/06/15 17:11:29 INFO mapred.JobClient: map 0% reduce 0%
12/06/15 17:11:47 INFO mapred.JobClient: map 12% reduce 0%
12/06/15 17:11:48 INFO mapred.JobClient: map 14% reduce 0%
12/06/15 17:11:49 INFO mapred.JobClient: map 15% reduce 0%
12/06/15 17:11:50 INFO mapred.JobClient: map 17% reduce 0%
12/06/15 17:11:51 INFO mapred.JobClient: map 19% reduce 0%
12/06/15 17:11:52 INFO mapred.JobClient: map 21% reduce 0%
12/06/15 17:11:53 INFO mapred.JobClient: map 36% reduce 0%
12/06/15 17:11:54 INFO mapred.JobClient: map 40% reduce 0%
12/06/15 17:11:55 INFO mapred.JobClient: map 45% reduce 0%
12/06/15 17:11:56 INFO mapred.JobClient: map 49% reduce 0%
12/06/15 17:11:57 INFO mapred.JobClient: map 56% reduce 0%
12/06/15 17:11:58 INFO mapred.JobClient: map 57% reduce 0%
12/06/15 17:11:59 INFO mapred.JobClient: map 74% reduce 0%
12/06/15 17:12:00 INFO mapred.JobClient: map 80% reduce 1%
12/06/15 17:12:01 INFO mapred.JobClient: map 80% reduce 2%
12/06/15 17:12:02 INFO mapred.JobClient: map 83% reduce 3%
12/06/15 17:12:03 INFO mapred.JobClient: map 91% reduce 4%
12/06/15 17:12:05 INFO mapred.JobClient: map 95% reduce 6%
12/06/15 17:12:06 INFO mapred.JobClient: map 95% reduce 9%
12/06/15 17:12:07 INFO mapred.JobClient: map 95% reduce 10%
12/06/15 17:12:08 INFO mapred.JobClient: map 100% reduce 14%
12/06/15 17:12:09 INFO mapred.JobClient: map 100% reduce 17%
12/06/15 17:12:10 INFO mapred.JobClient: map 100% reduce 18%
12/06/15 17:12:11 INFO mapred.JobClient: map 100% reduce 22%
12/06/15 17:12:13 INFO mapred.JobClient: map 100% reduce 23%
12/06/15 17:12:14 INFO mapred.JobClient: map 100% reduce 28%
12/06/15 17:12:15 INFO mapred.JobClient: map 100% reduce 31%
12/06/15 17:12:17 INFO mapred.JobClient: map 100% reduce 51%
12/06/15 17:12:18 INFO mapred.JobClient: map 100% reduce 65%
12/06/15 17:12:19 INFO mapred.JobClient: map 100% reduce 70%
12/06/15 17:12:20 INFO mapred.JobClient: map 100% reduce 87%
12/06/15 17:12:21 INFO mapred.JobClient: map 100% reduce 92%
12/06/15 17:12:22 INFO mapred.JobClient: map 100% reduce 94%
12/06/15 17:12:23 INFO mapred.JobClient: map 100% reduce 98%
12/06/15 17:12:26 INFO mapred.JobClient: map 100% reduce 100%
12/06/15 17:12:31 INFO mapred.JobClient: Job complete: job_201206112243_0018
12/06/15 17:12:32 INFO mapred.JobClient: Counters: 31
12/06/15 17:12:32 INFO mapred.JobClient:   Job Counters
12/06/15 17:12:32 INFO mapred.JobClient:    Launched reduce tasks=48
12/06/15 17:12:32 INFO mapred.JobClient:    SLOTS_MILLIS_MAPS=2980992
12/06/15 17:12:32 INFO mapred.JobClient:    Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/15 17:12:32 INFO mapred.JobClient:    Total time spent by all maps waiting after reserving slots (ms)=0
12/06/15 17:12:32 INFO mapred.JobClient:    Rack-local map tasks=158
12/06/15 17:12:32 INFO mapred.JobClient:    Launched map tasks=241
12/06/15 17:12:32 INFO mapred.JobClient:    Data-local map tasks=83
12/06/15 17:12:32 INFO mapred.JobClient:    SLOTS_MILLIS_REDUCES=1106915
12/06/15 17:12:32 INFO mapred.JobClient:   File Input Format Counters
12/06/15 17:12:32 INFO mapred.JobClient:    Bytes Read=5587101
12/06/15 17:12:32 INFO mapred.JobClient:   File Output Format Counters
12/06/15 17:12:32 INFO mapred.JobClient:    Bytes Written=2707836
12/06/15 17:12:32 INFO mapred.JobClient:   FileSystemCounters
12/06/15 17:12:32 INFO mapred.JobClient:    FILE_BYTES_READ=140515797
12/06/15 17:12:32 INFO mapred.JobClient:    HDFS_BYTES_READ=6112267
12/06/15 17:12:32 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=288167030
12/06/15 17:12:32 INFO mapred.JobClient:    HDFS_BYTES_WRITTEN=2707836
12/06/15 17:12:32 INFO mapred.JobClient:   Map-Reduce Framework
12/06/15 17:12:32 INFO mapred.JobClient:    Map output materialized bytes=140584917
12/06/15 17:12:32 INFO mapred.JobClient:    Map input records=100032
12/06/15 17:12:32 INFO mapred.JobClient:    Reduce shuffle bytes=140436273
12/06/15 17:12:32 INFO mapred.JobClient:    Spilled Records=5558658
12/06/15 17:12:32 INFO mapred.JobClient:    Map output bytes=134956851
12/06/15 17:12:32 INFO mapred.JobClient:    Total committed heap usage (bytes)=57936314368
12/06/15 17:12:32 INFO mapred.JobClient:    CPU time spent (ms)=1693370
12/06/15 17:12:32 INFO mapred.JobClient:    Map input bytes=5073092
12/06/15 17:12:32 INFO mapred.JobClient:    SPLIT_RAW_BYTES=24638
12/06/15 17:12:32 INFO mapred.JobClient:    Combine input records=0
12/06/15 17:12:32 INFO mapred.JobClient:    Reduce input records=2774585
12/06/15 17:12:32 INFO mapred.JobClient:     Reduce input groups=254196
12/06/15 17:12:32 INFO mapred.JobClient:     Combine output records=0
12/06/15 17:12:32 INFO mapred.JobClient:     Physical memory (bytes) snapshot=57459982336
12/06/15 17:12:32 INFO mapred.JobClient:     Reduce output records=81128
12/06/15 17:12:32 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=754874736640
12/06/15 17:12:32 INFO mapred.JobClient:     Map output records=2779329
CloudBurst Finished
Alignment time: 65.36
NUM_FMAP_TASKS: 24
NUM_FREDUCE_TASKS: 24
 Removing old results
12/06/15 17:12:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
12/06/15 17:12:32 INFO mapred.FileInputFormat: Total input paths to process : 48
12/06/15 17:12:39 INFO mapred.JobClient: Running job: job_201206112243_0019
12/06/15 17:12:40 INFO mapred.JobClient: map 0% reduce 0%
12/06/15 17:12:54 INFO mapred.JobClient: map 62% reduce 0%
12/06/15 17:12:55 INFO mapred.JobClient: map 100% reduce 0%
12/06/15 17:13:06 INFO mapred.JobClient: map 100% reduce 16%
12/06/15 17:13:07 INFO mapred.JobClient: map 100% reduce 33%
12/06/15 17:13:09 INFO mapred.JobClient: map 100% reduce 58%
12/06/15 17:13:10 INFO mapred.JobClient: map 100% reduce 75%
12/06/15 17:13:12 INFO mapred.JobClient: map 100% reduce 87%
12/06/15 17:13:13 INFO mapred.JobClient: map 100% reduce 91%
12/06/15 17:13:15 INFO mapred.JobClient: map 100% reduce 100%
12/06/15 17:13:20 INFO mapred.JobClient: Job complete: job_201206112243_0019
12/06/15 17:13:20 INFO mapred.JobClient: Counters: 31
12/06/15 17:13:20 INFO mapred.JobClient:    Job Counters
12/06/15 17:13:20 INFO mapred.JobClient:     Launched reduce tasks=24
12/06/15 17:13:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=207232
12/06/15 17:13:20 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/06/15 17:13:20 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/06/15 17:13:20 INFO mapred.JobClient:     Rack-local map tasks=5
12/06/15 17:13:20 INFO mapred.JobClient:     Launched map tasks=48
12/06/15 17:13:20 INFO mapred.JobClient:     Data-local map tasks=43
12/06/15 17:13:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=245651
12/06/15 17:13:20 INFO mapred.JobClient:    File Input Format Counters
12/06/15 17:13:20 INFO mapred.JobClient:     Bytes Read=2707836
12/06/15 17:13:20 INFO mapred.JobClient:     File Output Format Counters
12/06/15 17:13:20 INFO mapred.JobClient:      Bytes Written=2485042
12/06/15 17:13:20 INFO mapred.JobClient:     FileSystemCounters
12/06/15 17:13:20 INFO mapred.JobClient:      FILE_BYTES_READ=2188332
12/06/15 17:13:20 INFO mapred.JobClient:      HDFS_BYTES_READ=2713260
12/06/15 17:13:20 INFO mapred.JobClient:      FILE_BYTES_WRITTEN=6039532
12/06/15 17:13:20 INFO mapred.JobClient:      HDFS_BYTES_WRITTEN=2485042
12/06/15 17:13:20 INFO mapred.JobClient:     Map-Reduce Framework
12/06/15 17:13:20 INFO mapred.JobClient:      Map output materialized bytes=2195100
12/06/15 17:13:20 INFO mapred.JobClient:      Map input records=81128
12/06/15 17:13:20 INFO mapred.JobClient:      Reduce shuffle bytes=2153793
12/06/15 17:13:20 INFO mapred.JobClient:      Spilled Records=162088
12/06/15 17:13:20 INFO mapred.JobClient:      Map output bytes=2028200
12/06/15 17:13:20 INFO mapred.JobClient:      Total committed heap usage (bytes)=14471921664
12/06/15 17:13:20 INFO mapred.JobClient:      CPU time spent (ms)=95390
12/06/15 17:13:20 INFO mapred.JobClient:      Map input bytes=2703324
12/06/15 17:13:20 INFO mapred.JobClient:      SPLIT_RAW_BYTES=5424
12/06/15 17:13:20 INFO mapred.JobClient:      Combine input records=81128
12/06/15 17:13:20 INFO mapred.JobClient:      Reduce input records=81044
12/06/15 17:13:20 INFO mapred.JobClient:      Reduce input groups=76511
12/06/15 17:13:20 INFO mapred.JobClient:      Combine output records=81044
12/06/15 17:13:20 INFO mapred.JobClient:      Physical memory (bytes) snapshot=13169172480
12/06/15 17:13:20 INFO mapred.JobClient:      Reduce output records=74502
12/06/15 17:13:20 INFO mapred.JobClient:      Virtual memory (bytes) snapshot=193761902592
12/06/15 17:13:20 INFO mapred.JobClient:      Map output records=81128
FilterAlignments Finished
Filtering time: 48.481
Total Running time: 113.841
[hadoop@skcc-nebdap02 hadoop]$
[hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -get /data/results/ ../CloudBurst-1.1.0/results
[hadoop@skcc-nebdap02 hadoop]$ cd ../CloudBurst-1.1.0
[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ java -jar PrintAlignments.jar results | sort -nk4 > 100k.3.txt
[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ head -n 20 100k.3.txt
1     766133 766169 1         1     +
1     297899 297935 2         0     -
1     1325118 1325154 4       1      +
1     145970 146006 7         1     -
1     553513 553549 8         0     -
1    1779842 1779878 9        0               -
1    86299    86335   10      0           -
1    1503808 1503844 11           2           +
1    397758 397794 12         0               +
1    241778 241814 13         0               -
1    626711 626747 14         0               +
1    142141 142177 15         1               +
1    1401129 1401165 16           1           -
1    306289 306325 17         1               +
1    628571 628607 18         1               -
1    815172 815208 19         0               -
1    1624600 1624636 20           0           +
1    13779    13815   21      0           +
1    129064 129100 22         1               +
1    1382938 1382974 24           2           +
[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ tail -n 20 100k.3.txt
1    1796768 1796804 99976            2               -
1    1021128 1021164 99978            0               -
1    1350005 1350041 99980            1               +
1    799280 799316 99981              2           -
1    139518 139554 99983              0           +
1    57158    57194   99985       0               +
1    1663030 1663066 99986            2               +
1    549235 549271 99987              0           -
1    1400509 1400545 99988            0               +
1    880593 880629 99989              0           +
1    918064 918100 99990              0           +
1    937994 938030 99992              1           -
1    94456    94492   99993       0               +
1    1144320 1144356 99994            0               +
1    1441627 1441663 99995            0               +
1    1281557 1281593 99996            0               +
1    1323611 1323647 99997            2               -
1    800095 800131 99998              0           -
1    1956458 1956494 99999            1               +
1    134848 134884 100000 2                           -
[hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ wc -l 100k.3.txt
74502 100k.3.txt
Cloud burst tutorial

Contenu connexe

Tendances

Oracle cluster installation with grid and nfs
Oracle cluster  installation with grid and nfsOracle cluster  installation with grid and nfs
Oracle cluster installation with grid and nfsChanaka Lasantha
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxData
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows habeebulla g
 
Crs issue commands
Crs issue commandsCrs issue commands
Crs issue commandsraviranchi02
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
glance replicator
glance replicatorglance replicator
glance replicatoririx_jp
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Introduction of R on Hadoop
Introduction of R on HadoopIntroduction of R on Hadoop
Introduction of R on HadoopChung-Tsai Su
 
Hadoop Installation and basic configuration
Hadoop Installation and basic configurationHadoop Installation and basic configuration
Hadoop Installation and basic configurationGerrit van Vuuren
 
Hadoop installation
Hadoop installationHadoop installation
Hadoop installationhabeebulla g
 
Jan 2013 HUG: Dist cpv2 for hug 20130116
Jan 2013 HUG: Dist cpv2 for hug 20130116Jan 2013 HUG: Dist cpv2 for hug 20130116
Jan 2013 HUG: Dist cpv2 for hug 20130116Yahoo Developer Network
 
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationDrizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationAndrew Hutchings
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humansCraig Kerstiens
 
Commands documentaion
Commands documentaionCommands documentaion
Commands documentaionTejalNijai
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 
Unleash your cluster with YARN
Unleash your cluster with YARNUnleash your cluster with YARN
Unleash your cluster with YARNFerran Galí Reniu
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 

Tendances (20)

Oracle cluster installation with grid and nfs
Oracle cluster  installation with grid and nfsOracle cluster  installation with grid and nfs
Oracle cluster installation with grid and nfs
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Crs issue commands
Crs issue commandsCrs issue commands
Crs issue commands
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
glance replicator
glance replicatorglance replicator
glance replicator
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Introduction of R on Hadoop
Introduction of R on HadoopIntroduction of R on Hadoop
Introduction of R on Hadoop
 
Hadoop Installation and basic configuration
Hadoop Installation and basic configurationHadoop Installation and basic configuration
Hadoop Installation and basic configuration
 
Hadoop installation
Hadoop installationHadoop installation
Hadoop installation
 
Postgresql Federation
Postgresql FederationPostgresql Federation
Postgresql Federation
 
Jan 2013 HUG: Dist cpv2 for hug 20130116
Jan 2013 HUG: Dist cpv2 for hug 20130116Jan 2013 HUG: Dist cpv2 for hug 20130116
Jan 2013 HUG: Dist cpv2 for hug 20130116
 
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free MigrationDrizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free Migration
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humans
 
Commands documentaion
Commands documentaionCommands documentaion
Commands documentaion
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Unleash your cluster with YARN
Unleash your cluster with YARNUnleash your cluster with YARN
Unleash your cluster with YARN
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 

En vedette

eSafety: pasa-hitz seguruen erabilpena
eSafety:  pasa-hitz seguruen erabilpenaeSafety:  pasa-hitz seguruen erabilpena
eSafety: pasa-hitz seguruen erabilpenaEASO Politeknikoa
 
Nasıl Fikirci Olunur
Nasıl Fikirci OlunurNasıl Fikirci Olunur
Nasıl Fikirci OlunurFikirMarketim
 
Casas madera criterios_medioambientales
Casas madera criterios_medioambientalesCasas madera criterios_medioambientales
Casas madera criterios_medioambientalesEASO Politeknikoa
 
Kuluçka Prensibiyle Düşünme Tekniği
Kuluçka Prensibiyle Düşünme TekniğiKuluçka Prensibiyle Düşünme Tekniği
Kuluçka Prensibiyle Düşünme TekniğiFikirMarketim
 
Scriptura Praeteriti
Scriptura PraeteritiScriptura Praeteriti
Scriptura Praeteritijuudiith01
 
Slideshare powerpoint
Slideshare powerpointSlideshare powerpoint
Slideshare powerpointJack Matthews
 
10 logical clocks
10 logical clocks10 logical clocks
10 logical clocksThuy Hu
 
Dig comporg arantzabela_ikastola
Dig comporg arantzabela_ikastolaDig comporg arantzabela_ikastola
Dig comporg arantzabela_ikastolaEASO Politeknikoa
 
Prestakuntza eSafety irakasleen prestakuntza plana 2014 2015
Prestakuntza eSafety irakasleen prestakuntza plana 2014 2015 Prestakuntza eSafety irakasleen prestakuntza plana 2014 2015
Prestakuntza eSafety irakasleen prestakuntza plana 2014 2015 EASO Politeknikoa
 
E safety in the school management system
E safety  in the school management systemE safety  in the school management system
E safety in the school management systemEASO Politeknikoa
 
소셜미디어 사서직 취업동향
소셜미디어 사서직 취업동향소셜미디어 사서직 취업동향
소셜미디어 사서직 취업동향Gil Su Jang
 
Scriptura Praeteriti
Scriptura PraeteritiScriptura Praeteriti
Scriptura Praeteritijuudiith01
 
Regression & Classification
Regression & ClassificationRegression & Classification
Regression & Classification주영 송
 
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)주영 송
 

En vedette (20)

eSafety: pasa-hitz seguruen erabilpena
eSafety:  pasa-hitz seguruen erabilpenaeSafety:  pasa-hitz seguruen erabilpena
eSafety: pasa-hitz seguruen erabilpena
 
Nasıl Fikirci Olunur
Nasıl Fikirci OlunurNasıl Fikirci Olunur
Nasıl Fikirci Olunur
 
Casas madera criterios_medioambientales
Casas madera criterios_medioambientalesCasas madera criterios_medioambientales
Casas madera criterios_medioambientales
 
museum
museummuseum
museum
 
Kuluçka Prensibiyle Düşünme Tekniği
Kuluçka Prensibiyle Düşünme TekniğiKuluçka Prensibiyle Düşünme Tekniği
Kuluçka Prensibiyle Düşünme Tekniği
 
Icbme 2011
Icbme 2011Icbme 2011
Icbme 2011
 
Scriptura Praeteriti
Scriptura PraeteritiScriptura Praeteriti
Scriptura Praeteriti
 
Slideshare powerpoint
Slideshare powerpointSlideshare powerpoint
Slideshare powerpoint
 
10 logical clocks
10 logical clocks10 logical clocks
10 logical clocks
 
Dig comporg arantzabela_ikastola
Dig comporg arantzabela_ikastolaDig comporg arantzabela_ikastola
Dig comporg arantzabela_ikastola
 
Dig comporg TKNIKA
Dig comporg TKNIKADig comporg TKNIKA
Dig comporg TKNIKA
 
Prestakuntza eSafety irakasleen prestakuntza plana 2014 2015
Prestakuntza eSafety irakasleen prestakuntza plana 2014 2015 Prestakuntza eSafety irakasleen prestakuntza plana 2014 2015
Prestakuntza eSafety irakasleen prestakuntza plana 2014 2015
 
E safety in the school management system
E safety  in the school management systemE safety  in the school management system
E safety in the school management system
 
소셜미디어 사서직 취업동향
소셜미디어 사서직 취업동향소셜미디어 사서직 취업동향
소셜미디어 사서직 취업동향
 
Concurso diseno muebles
Concurso diseno mueblesConcurso diseno muebles
Concurso diseno muebles
 
Scriptura Praeteriti
Scriptura PraeteritiScriptura Praeteriti
Scriptura Praeteriti
 
Mobiliario ecodisenado
Mobiliario ecodisenadoMobiliario ecodisenado
Mobiliario ecodisenado
 
Beerlegend.by
Beerlegend.byBeerlegend.by
Beerlegend.by
 
Regression & Classification
Regression & ClassificationRegression & Classification
Regression & Classification
 
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
MapReduce 실행 샘플 (K-mer Counting, K-means Clustering)
 

Similaire à Cloud burst tutorial

Capital onehadoopclass
Capital onehadoopclassCapital onehadoopclass
Capital onehadoopclassDoug Chang
 
Learning the command line
Learning the command lineLearning the command line
Learning the command lineAdrian Cardenas
 
Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識維泰 蔡
 
Writing your own RDD for fun and profit
Writing your own RDD for fun and profitWriting your own RDD for fun and profit
Writing your own RDD for fun and profitPawel Szulc
 
Introduction to cloudforecast
Introduction to cloudforecastIntroduction to cloudforecast
Introduction to cloudforecastMasahiro Nagano
 
An example Hadoop Install
An example Hadoop InstallAn example Hadoop Install
An example Hadoop InstallMike Frampton
 
Conquering the Command Line
Conquering the Command LineConquering the Command Line
Conquering the Command LineAdrian Cardenas
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedBrendan Gregg
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby SystemsEngine Yard
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadeaviadea
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging RubyAman Gupta
 
Keynote 1 - Engineering Software Analytics Studies
Keynote 1 - Engineering Software Analytics StudiesKeynote 1 - Engineering Software Analytics Studies
Keynote 1 - Engineering Software Analytics StudiesESEM 2014
 
First there was the command line
First there was the command lineFirst there was the command line
First there was the command lineAdrian Cardenas
 
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL softbasemarketing
 
Noah Zoschke at Waza 2013: Heroku Secrets
Noah Zoschke at Waza 2013: Heroku SecretsNoah Zoschke at Waza 2013: Heroku Secrets
Noah Zoschke at Waza 2013: Heroku SecretsHeroku
 
Miscelaneous Debris
Miscelaneous DebrisMiscelaneous Debris
Miscelaneous Debrisfrewmbot
 

Similaire à Cloud burst tutorial (20)

Bash tricks
Bash tricksBash tricks
Bash tricks
 
Capital onehadoopclass
Capital onehadoopclassCapital onehadoopclass
Capital onehadoopclass
 
Learning the command line
Learning the command lineLearning the command line
Learning the command line
 
Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識
 
Writing your own RDD for fun and profit
Writing your own RDD for fun and profitWriting your own RDD for fun and profit
Writing your own RDD for fun and profit
 
Introduction to cloudforecast
Introduction to cloudforecastIntroduction to cloudforecast
Introduction to cloudforecast
 
An example Hadoop Install
An example Hadoop InstallAn example Hadoop Install
An example Hadoop Install
 
Conquering the Command Line
Conquering the Command LineConquering the Command Line
Conquering the Command Line
 
Dtalk shell
Dtalk shellDtalk shell
Dtalk shell
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
 
Template
TemplateTemplate
Template
 
Overview of Spark for HPC
Overview of Spark for HPCOverview of Spark for HPC
Overview of Spark for HPC
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
Keynote 1 - Engineering Software Analytics Studies
Keynote 1 - Engineering Software Analytics StudiesKeynote 1 - Engineering Software Analytics Studies
Keynote 1 - Engineering Software Analytics Studies
 
First there was the command line
First there was the command lineFirst there was the command line
First there was the command line
 
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
 
Noah Zoschke at Waza 2013: Heroku Secrets
Noah Zoschke at Waza 2013: Heroku SecretsNoah Zoschke at Waza 2013: Heroku Secrets
Noah Zoschke at Waza 2013: Heroku Secrets
 
Miscelaneous Debris
Miscelaneous DebrisMiscelaneous Debris
Miscelaneous Debris
 

Plus de 주영 송

5일차.map reduce 활용
5일차.map reduce 활용5일차.map reduce 활용
5일차.map reduce 활용주영 송
 
SNA & R (20121011)
SNA & R (20121011)SNA & R (20121011)
SNA & R (20121011)주영 송
 
Recommendation system 소개 (1)
Recommendation system 소개 (1)Recommendation system 소개 (1)
Recommendation system 소개 (1)주영 송
 
Cloud burst 소개
Cloud burst 소개Cloud burst 소개
Cloud burst 소개주영 송
 
Mongo db 활용 가이드 ch7
Mongo db 활용 가이드 ch7Mongo db 활용 가이드 ch7
Mongo db 활용 가이드 ch7주영 송
 

Plus de 주영 송 (10)

R_datamining
R_dataminingR_datamining
R_datamining
 
Giraph
GiraphGiraph
Giraph
 
Mahout
MahoutMahout
Mahout
 
5일차.map reduce 활용
5일차.map reduce 활용5일차.map reduce 활용
5일차.map reduce 활용
 
SNA & R (20121011)
SNA & R (20121011)SNA & R (20121011)
SNA & R (20121011)
 
Recommendation system 소개 (1)
Recommendation system 소개 (1)Recommendation system 소개 (1)
Recommendation system 소개 (1)
 
Cloud burst 소개
Cloud burst 소개Cloud burst 소개
Cloud burst 소개
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
R intro
R introR intro
R intro
 
Mongo db 활용 가이드 ch7
Mongo db 활용 가이드 ch7Mongo db 활용 가이드 ch7
Mongo db 활용 가이드 ch7
 

Dernier

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Cloud burst tutorial

  • 1. 참고 http://sourceforge.net/apps/mediawiki/cloudburst-bio/index.php?title=Sample_Results [hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ ll total 24332 -rw-r--r-- 1 hadoop hadoop 1995984 Jun 15 17:19 100k.3.txt -rw-r--r-- 1 hadoop hadoop 1995984 Dec 5 2008 100k.3.txt.gold -rw-r--r-- 1 hadoop hadoop 4493593 Jun 15 17:06 100k.br -rw-r--r-- 1 hadoop hadoop 4388895 Dec 5 2008 100k.fa -rw-r--r-- 1 hadoop hadoop 1177790 Jun 15 17:06 100k.fa.map -rw-r--r-- 1 hadoop hadoop 8337 Dec 5 2008 cloudburst.err.gold -rw-r--r-- 1 hadoop hadoop 57014 Jul 9 2010 CloudBurst.jar -rw-r--r-- 1 hadoop hadoop 4067962 Jul 9 2010 ConvertFastaForCloud.jar -rw-r--r-- 1 hadoop hadoop 4067959 Jul 9 2010 PrintAlignments.jar -rw-r--r-- 1 hadoop hadoop 1452 Jul 9 2010 README.txt drwxr-xr-x 2 hadoop hadoop 4096 Jun 15 17:19 results -rw-r--r-- 1 hadoop hadoop 579773 Jun 15 17:06 s_suis.br -rw-r--r-- 1 hadoop hadoop 2040970 Dec 5 2008 s_suis.fa -rw-r--r-- 1 hadoop hadoop 21 Jun 15 17:06 s_suis.fa.map [hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ cat README.txt Sample data for CloudBurst ========================== CloudBurst has several parameters to control the sensitivity of the alignment algorithm. Here it finds the unambiguous best alignment for 100,000 reads allowing up to 3 mismatches when mapping to the corresponding S. suis genome. == Sample input data s_suis.fa: Streptococcus suis reference genome sequence 100k.fa: 100,000 36bp Illumina reads available from http://www.sanger.ac.uk/Projects/S_suis/ == Format the input data $ java -jar ConvertFastaForCloud.jar s_suis.fa s_suis.br $ java -jar ConvertFastaForCloud.jar 100k.fa 100k.br
  • 2. s_suis.br: reference genome in CloudBurst binary format 100k.br: Reads in CloudBurst binary format ... 생략 ... [hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ head s_suis.fa >Streptococcus_suis atgaaccaagaacaacttttttggcaacgatttattgaattggcaaaggtaaattttaag ccatctatttatgatttttatgtcgctgatgcaaaattactcggaatcaaccagcaagtt gccaatattttcttaaatcgtccatttaaaaaagatttctgggaaaaaaacttcgaagag ttaatgattgccgctagttttgaaagctacggagagcctcttaccatccaatatcaattt ... 생략 ... acagaggatgaacaggagattaggaatactacaaacacaagaagttcaatagttcaccag gtacagacacttgagccggctactcctcaagaaacttttaaaccggttcattctgatata aaatcccagtacacctttgctaattttgtacaaggagacaataatcactgggcaaaggct gcagctttagctgtatctgataacctaggtgagctctacaatccattattcatttttggt ggtcctggtcttggaaaaactcatattttaaatgcgattggaaataaggttctagccgat [hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ wc -l s_suis.fa 33460 s_suis.fa [hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ head 100k.fa >1 GCCTGTTCTTTACATGATTTTTGGTCTAGTGTATGG >2 AACCGCTGTAAAGGCTTCTGCCACACCGATTTCTTG >3 GAGGTGATTGTGGTATTGT.GGTAAATCGGTGATTG >4 GCTTTAGCCGACCTGAACT.GACTACAAGTTGACCA >5 AAAGGCTACCCGCGGTTGAACCTTACGTGACACATT [hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ tail 100k.fa >99996 AATGCCCGTAACAACGGGCTTTTATCTTGTTCTAAA >99997 GTCAGATAGCGCAGGAATTTCAAAGGAATTTGGACC >99998 AGTTAACTCTTCAGCTGTAAAGTTGTAGTTTTCTAA >99999
  • 3. GCGGCATAAATTGGATAAAGAAAGAACTGAAGGACA >100000 GTTACCATGTATTGTGACAGATAACCACGGTGGAGT [hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -mkdir /data/cloudburst [hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -put ../CloudBurst-1.1.0/s_suis.br /data/cloudburst [hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -put ../CloudBurst-1.1.0/100k.br /data/cloudburst [hadoop@skcc-nebdap02 hadoop]$ bin/hadoop jar ../CloudBurst-1.1.0/CloudBurst.jar Usage: CloudBurst refpath qrypath outpath minreadlen maxreadlen k allowdifferences filteralignments #mappers #reduces #fmappers #freducers blocksize redundancy 1. refpath: path in hdfs to the reference file 2. qrypath: path in hdfs to the query file 3. outpath: path to a directory to store the results (old results are automatically deleted) 4. minreadlen: minimum length of the reads 5. maxreadlen: maximum read length 6. k: number of mismatches / differences to allow (higher number requires more time) 7. allowdifferences: 0: mismatches only, 1: indels as well 8. filteralignments: 0: all alignments, 1: only report unambiguous best alignment (results identical to RMAP) 9. #mappers: number of mappers to use. suggested: #processor-cores * 10 10. #reduces: number of reducers to use. suggested: #processor-cores * 2 11. #fmappers: number of mappers for filtration alg. suggested: #processor-cores 12. #freducers: number of reducers for filtration alg. suggested: #processor-cores 13. blocksize: number of qry and ref tuples to consider at a time in the reduce phase. suggested: 128 14. redundancy: number of copies of low complexity seeds to use. suggested: # processor cores [hadoop@skcc-nebdap02 hadoop]$ bin/hadoop jar ../CloudBurst-1.1.0/CloudBurst.jar /data/cloudburst/s_suis.br /data/cloudburst/100k.br /data/results 36 36 3 0 1 240 48 24 24 128 16 >& cloudburst.err [hadoop@skcc-nebdap02 hadoop]$ cat cloudburst.err refath: /data/cloudburst/s_suis.br qrypath: /data/cloudburst/100k.br outpath: /data/results-alignments MIN_READ_LEN: 36 MAX_READ_LEN: 36 K: 3 SEED_LEN: 9 FLANK_LEN: 30 ALLOW_DIFFERENCES: 0 FILTER_ALIGNMENTS: true NUM_MAP_TASKS: 240
  • 4. NUM_REDUCE_TASKS: 48 BLOCK_SIZE: 128 REDUNDANCY: 16 Removing old results 12/06/15 17:11:27 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/06/15 17:11:28 INFO mapred.FileInputFormat: Total input paths to process : 2 12/06/15 17:11:28 INFO mapred.JobClient: Running job: job_201206112243_0018 12/06/15 17:11:29 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 17:11:47 INFO mapred.JobClient: map 12% reduce 0% 12/06/15 17:11:48 INFO mapred.JobClient: map 14% reduce 0% 12/06/15 17:11:49 INFO mapred.JobClient: map 15% reduce 0% 12/06/15 17:11:50 INFO mapred.JobClient: map 17% reduce 0% 12/06/15 17:11:51 INFO mapred.JobClient: map 19% reduce 0% 12/06/15 17:11:52 INFO mapred.JobClient: map 21% reduce 0% 12/06/15 17:11:53 INFO mapred.JobClient: map 36% reduce 0% 12/06/15 17:11:54 INFO mapred.JobClient: map 40% reduce 0% 12/06/15 17:11:55 INFO mapred.JobClient: map 45% reduce 0% 12/06/15 17:11:56 INFO mapred.JobClient: map 49% reduce 0% 12/06/15 17:11:57 INFO mapred.JobClient: map 56% reduce 0% 12/06/15 17:11:58 INFO mapred.JobClient: map 57% reduce 0% 12/06/15 17:11:59 INFO mapred.JobClient: map 74% reduce 0% 12/06/15 17:12:00 INFO mapred.JobClient: map 80% reduce 1% 12/06/15 17:12:01 INFO mapred.JobClient: map 80% reduce 2% 12/06/15 17:12:02 INFO mapred.JobClient: map 83% reduce 3% 12/06/15 17:12:03 INFO mapred.JobClient: map 91% reduce 4% 12/06/15 17:12:05 INFO mapred.JobClient: map 95% reduce 6% 12/06/15 17:12:06 INFO mapred.JobClient: map 95% reduce 9% 12/06/15 17:12:07 INFO mapred.JobClient: map 95% reduce 10% 12/06/15 17:12:08 INFO mapred.JobClient: map 100% reduce 14% 12/06/15 17:12:09 INFO mapred.JobClient: map 100% reduce 17% 12/06/15 17:12:10 INFO mapred.JobClient: map 100% reduce 18% 12/06/15 17:12:11 INFO mapred.JobClient: map 100% reduce 22% 12/06/15 17:12:13 INFO mapred.JobClient: map 100% reduce 23% 12/06/15 17:12:14 INFO mapred.JobClient: map 100% reduce 28% 12/06/15 17:12:15 INFO mapred.JobClient: map 100% reduce 31% 12/06/15 17:12:17 INFO mapred.JobClient: map 100% reduce 51% 12/06/15 17:12:18 INFO mapred.JobClient: map 100% reduce 65%
  • 5. 12/06/15 17:12:19 INFO mapred.JobClient: map 100% reduce 70% 12/06/15 17:12:20 INFO mapred.JobClient: map 100% reduce 87% 12/06/15 17:12:21 INFO mapred.JobClient: map 100% reduce 92% 12/06/15 17:12:22 INFO mapred.JobClient: map 100% reduce 94% 12/06/15 17:12:23 INFO mapred.JobClient: map 100% reduce 98% 12/06/15 17:12:26 INFO mapred.JobClient: map 100% reduce 100% 12/06/15 17:12:31 INFO mapred.JobClient: Job complete: job_201206112243_0018 12/06/15 17:12:32 INFO mapred.JobClient: Counters: 31 12/06/15 17:12:32 INFO mapred.JobClient: Job Counters 12/06/15 17:12:32 INFO mapred.JobClient: Launched reduce tasks=48 12/06/15 17:12:32 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=2980992 12/06/15 17:12:32 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 17:12:32 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 17:12:32 INFO mapred.JobClient: Rack-local map tasks=158 12/06/15 17:12:32 INFO mapred.JobClient: Launched map tasks=241 12/06/15 17:12:32 INFO mapred.JobClient: Data-local map tasks=83 12/06/15 17:12:32 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=1106915 12/06/15 17:12:32 INFO mapred.JobClient: File Input Format Counters 12/06/15 17:12:32 INFO mapred.JobClient: Bytes Read=5587101 12/06/15 17:12:32 INFO mapred.JobClient: File Output Format Counters 12/06/15 17:12:32 INFO mapred.JobClient: Bytes Written=2707836 12/06/15 17:12:32 INFO mapred.JobClient: FileSystemCounters 12/06/15 17:12:32 INFO mapred.JobClient: FILE_BYTES_READ=140515797 12/06/15 17:12:32 INFO mapred.JobClient: HDFS_BYTES_READ=6112267 12/06/15 17:12:32 INFO mapred.JobClient: FILE_BYTES_WRITTEN=288167030 12/06/15 17:12:32 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2707836 12/06/15 17:12:32 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 17:12:32 INFO mapred.JobClient: Map output materialized bytes=140584917 12/06/15 17:12:32 INFO mapred.JobClient: Map input records=100032 12/06/15 17:12:32 INFO mapred.JobClient: Reduce shuffle bytes=140436273 12/06/15 17:12:32 INFO mapred.JobClient: Spilled Records=5558658 12/06/15 17:12:32 INFO mapred.JobClient: Map output bytes=134956851 12/06/15 17:12:32 INFO mapred.JobClient: Total committed heap usage (bytes)=57936314368 12/06/15 17:12:32 INFO mapred.JobClient: CPU time spent (ms)=1693370 12/06/15 17:12:32 INFO mapred.JobClient: Map input bytes=5073092 12/06/15 17:12:32 INFO mapred.JobClient: SPLIT_RAW_BYTES=24638 12/06/15 17:12:32 INFO mapred.JobClient: Combine input records=0 12/06/15 17:12:32 INFO mapred.JobClient: Reduce input records=2774585
  • 6. 12/06/15 17:12:32 INFO mapred.JobClient: Reduce input groups=254196 12/06/15 17:12:32 INFO mapred.JobClient: Combine output records=0 12/06/15 17:12:32 INFO mapred.JobClient: Physical memory (bytes) snapshot=57459982336 12/06/15 17:12:32 INFO mapred.JobClient: Reduce output records=81128 12/06/15 17:12:32 INFO mapred.JobClient: Virtual memory (bytes) snapshot=754874736640 12/06/15 17:12:32 INFO mapred.JobClient: Map output records=2779329 CloudBurst Finished Alignment time: 65.36 NUM_FMAP_TASKS: 24 NUM_FREDUCE_TASKS: 24 Removing old results 12/06/15 17:12:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/06/15 17:12:32 INFO mapred.FileInputFormat: Total input paths to process : 48 12/06/15 17:12:39 INFO mapred.JobClient: Running job: job_201206112243_0019 12/06/15 17:12:40 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 17:12:54 INFO mapred.JobClient: map 62% reduce 0% 12/06/15 17:12:55 INFO mapred.JobClient: map 100% reduce 0% 12/06/15 17:13:06 INFO mapred.JobClient: map 100% reduce 16% 12/06/15 17:13:07 INFO mapred.JobClient: map 100% reduce 33% 12/06/15 17:13:09 INFO mapred.JobClient: map 100% reduce 58% 12/06/15 17:13:10 INFO mapred.JobClient: map 100% reduce 75% 12/06/15 17:13:12 INFO mapred.JobClient: map 100% reduce 87% 12/06/15 17:13:13 INFO mapred.JobClient: map 100% reduce 91% 12/06/15 17:13:15 INFO mapred.JobClient: map 100% reduce 100% 12/06/15 17:13:20 INFO mapred.JobClient: Job complete: job_201206112243_0019 12/06/15 17:13:20 INFO mapred.JobClient: Counters: 31 12/06/15 17:13:20 INFO mapred.JobClient: Job Counters 12/06/15 17:13:20 INFO mapred.JobClient: Launched reduce tasks=24 12/06/15 17:13:20 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=207232 12/06/15 17:13:20 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 17:13:20 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 17:13:20 INFO mapred.JobClient: Rack-local map tasks=5 12/06/15 17:13:20 INFO mapred.JobClient: Launched map tasks=48 12/06/15 17:13:20 INFO mapred.JobClient: Data-local map tasks=43 12/06/15 17:13:20 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=245651 12/06/15 17:13:20 INFO mapred.JobClient: File Input Format Counters 12/06/15 17:13:20 INFO mapred.JobClient: Bytes Read=2707836
  • 7. 12/06/15 17:13:20 INFO mapred.JobClient: File Output Format Counters 12/06/15 17:13:20 INFO mapred.JobClient: Bytes Written=2485042 12/06/15 17:13:20 INFO mapred.JobClient: FileSystemCounters 12/06/15 17:13:20 INFO mapred.JobClient: FILE_BYTES_READ=2188332 12/06/15 17:13:20 INFO mapred.JobClient: HDFS_BYTES_READ=2713260 12/06/15 17:13:20 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6039532 12/06/15 17:13:20 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2485042 12/06/15 17:13:20 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 17:13:20 INFO mapred.JobClient: Map output materialized bytes=2195100 12/06/15 17:13:20 INFO mapred.JobClient: Map input records=81128 12/06/15 17:13:20 INFO mapred.JobClient: Reduce shuffle bytes=2153793 12/06/15 17:13:20 INFO mapred.JobClient: Spilled Records=162088 12/06/15 17:13:20 INFO mapred.JobClient: Map output bytes=2028200 12/06/15 17:13:20 INFO mapred.JobClient: Total committed heap usage (bytes)=14471921664 12/06/15 17:13:20 INFO mapred.JobClient: CPU time spent (ms)=95390 12/06/15 17:13:20 INFO mapred.JobClient: Map input bytes=2703324 12/06/15 17:13:20 INFO mapred.JobClient: SPLIT_RAW_BYTES=5424 12/06/15 17:13:20 INFO mapred.JobClient: Combine input records=81128 12/06/15 17:13:20 INFO mapred.JobClient: Reduce input records=81044 12/06/15 17:13:20 INFO mapred.JobClient: Reduce input groups=76511 12/06/15 17:13:20 INFO mapred.JobClient: Combine output records=81044 12/06/15 17:13:20 INFO mapred.JobClient: Physical memory (bytes) snapshot=13169172480 12/06/15 17:13:20 INFO mapred.JobClient: Reduce output records=74502 12/06/15 17:13:20 INFO mapred.JobClient: Virtual memory (bytes) snapshot=193761902592 12/06/15 17:13:20 INFO mapred.JobClient: Map output records=81128 FilterAlignments Finished Filtering time: 48.481 Total Running time: 113.841 [hadoop@skcc-nebdap02 hadoop]$ [hadoop@skcc-nebdap02 hadoop]$ bin/hadoop dfs -get /data/results/ ../CloudBurst-1.1.0/results [hadoop@skcc-nebdap02 hadoop]$ cd ../CloudBurst-1.1.0 [hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ java -jar PrintAlignments.jar results | sort -nk4 > 100k.3.txt [hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ head -n 20 100k.3.txt 1 766133 766169 1 1 + 1 297899 297935 2 0 - 1 1325118 1325154 4 1 + 1 145970 146006 7 1 - 1 553513 553549 8 0 -
  • 8. 1 1779842 1779878 9 0 - 1 86299 86335 10 0 - 1 1503808 1503844 11 2 + 1 397758 397794 12 0 + 1 241778 241814 13 0 - 1 626711 626747 14 0 + 1 142141 142177 15 1 + 1 1401129 1401165 16 1 - 1 306289 306325 17 1 + 1 628571 628607 18 1 - 1 815172 815208 19 0 - 1 1624600 1624636 20 0 + 1 13779 13815 21 0 + 1 129064 129100 22 1 + 1 1382938 1382974 24 2 + [hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ tail -n 20 100k.3.txt 1 1796768 1796804 99976 2 - 1 1021128 1021164 99978 0 - 1 1350005 1350041 99980 1 + 1 799280 799316 99981 2 - 1 139518 139554 99983 0 + 1 57158 57194 99985 0 + 1 1663030 1663066 99986 2 + 1 549235 549271 99987 0 - 1 1400509 1400545 99988 0 + 1 880593 880629 99989 0 + 1 918064 918100 99990 0 + 1 937994 938030 99992 1 - 1 94456 94492 99993 0 + 1 1144320 1144356 99994 0 + 1 1441627 1441663 99995 0 + 1 1281557 1281593 99996 0 + 1 1323611 1323647 99997 2 - 1 800095 800131 99998 0 - 1 1956458 1956494 99999 1 + 1 134848 134884 100000 2 - [hadoop@skcc-nebdap02 CloudBurst-1.1.0]$ wc -l 100k.3.txt 74502 100k.3.txt