Workshop on command line tools - day 2

I Workshop on command-
line tools
(day 2)
Center for Applied Genomics
Children's Hospital of Philadelphia
February 12-13, 2015

awk - a powerful way to check conditions
and show specific columns
Example: show only CNV that use less than 3
targets (exons)
tail -n +2 DATA.xcnv | awk '$8 <= 3'

awk - different ways to do the same thing
tail -n +2 DATA.xcnv | awk '$8 <= 3'
# same effect 1
tail -n +2 DATA.xcnv | awk '$8 <= 3 {print}'
# same effect 2
tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print}'
# same effect 3
tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print $0}'
# different effect
tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print $1}'

awk - more options on if statement
# Applying XHMM "gold" thresholds (KB >= 1,
# NUM_TARG >= 3, Q_SOME >= 65, Q_NON_DIPLOID >= 65)
tail -n +2 DATA.xcnv |
awk '$4 >= 1 && $8 >= 3 && $10 >= 65 && $11 >= 65'
> DATA.gold.xcnv
# Using only awk
awk 'NR > 1 && $4 >= 1 && $8 >= 3 &&
$10 >= 65 && $11 >= 65' DATA.xcnv > DATA.gold2.xcnv

diff - compare files line by line
# Compare
diff DATA.gold.xcnv DATA.gold2.xcnv
# Tip: install tkdiff to use a
# graphic version of diff

Exercises
1. Using adhd.map, show 10 SNPs with rsID starting with 'rs' on
chrom. 2, between positions 1Mb and 2Mb
2. Check which chromosome has more SNPs
3. Check which snp IDs are duplicated

More awk - inserting external variables
awk -v Mb=1000000 -v chrom=2
'$1 == chrom && int($4) >= Mb && int($4) <= 2*Mb'
adhd.map | less
# Printing specific columns
awk -v Mb=1000000 -v chrom=2
'$1 == chrom && int($4) >= Mb && int($4) <= 2*Mb
{print $1" "$2" "$4}'
adhd.map | less

Using awk to check number of variants
in ped files
# Options using only awk, but takes (much) more time
awk 'NR == 1 {print (NF-6)/2}' adhd.ped
awk 'NR < 2 {print (NF-6)/2}' adhd.ped # Slow, too
# Better alternative
head -n 1 adhd.ped | awk '{print (NF-6)/2}'
# Now, the map file
wc -l adhd.map

time - time command execution
time head -n 1 adhd.ped | awk '{print (NF-6)/2}'
real 0m0.485s
user 0m0.391s
sys 0m0.064s
time awk 'NR < 2 {print (NF-6)/2}' adhd.ped
# Forget… just press Ctrl+C
real 1m0.611s
user 0m51.261s
sys 0m0.826s

top - display and update sorted information
about processes / display Linux taks
top
z : color
k : kill process
u : choose specific user
c : show complete commands running
1 : show usage of singles CPUs
q : quit

screen - screen manager with terminal emulation (i)
screen
screen -S <session_name>
Ctrl+a, then c: create window
Ctrl+a, then n: go to next window
Ctrl+a, then p: go to previous window
Ctrl+a, then 0: go to window number 0
Ctrl+a, then z: leave your session, but keep running

screen - screen manager with terminal emulation (ii)
Ctrl+a, then [ : activate copy mode (to scroll screen)
q : quit copy mode
exit : close current window
screen -r : resume the only session detached
screen -r <session_name> : resume specific
session detached
screen -rD <session_name> : reattach session

split - split a file into pieces
split -l <lines_of_each_piece> <input> <prefix>
# Example
split -l 100000 adhd.map map_
wc -l map_*

in-line Perl/sed to find and replace (i)
head DATA.gold.xcnv | cut -f3 | perl -pe 's/chr/CHR/g'
head DATA.gold.xcnv | cut -f3 | perl -pe 's/chr//g'
# Other possibilities
head DATA.gold.xcnv | cut -f3 | perl -pe 's|chr||g'
head DATA.gold.xcnv | cut -f3 | perl -pe 's!chr!!g'
head DATA.gold.xcnv | cut -f3 | sed 's/chr//g'
# Creating a BED file
head DATA.gold.xcnv | cut -f3 | perl -pe 's/[:-]/t/g'

in-line Perl/sed to find and replace (ii)
# "s" means substitute
# "g" means global (replace all matches, not only first)
# See the difference...
head DATA.gold.xcnv | cut -f3 | sed 's/9/nine/g'
head DATA.gold.xcnv | cut -f3 | sed 's/9/nine/'
# Adding more replacements
head DATA.gold.xcnv | cut -f3 | sed 's/1/one/g; s/2/two/g'

copy from terminal to clipboard/
paste from clipboard to terminal
# This is like Ctrl+V in your terminal
pbpaste
# This is like Ctrl+C from your terminal
head DATA.xcnv | pbcopy
# Then, Ctrl+V in other text editor
# On Linux, you can install "xclip"
http://sourceforge.net/projects/xclip/

datamash - command-line calculations
tail -n +2 DATA.xcnv |
head |
cut -f6,10,11 |
datamash mean 1 sum 2 min 3
# mean of 1st column
# sum of 2nd column
# minimum of 3rd column
http://www.gnu.org/software/datamash/

touch - change file access and
modification times
ls -lh DATA.gold.xcnv
touch DATA.gold.xcnv
ls -lh DATA.gold.xcnv

Introduction to "for" loop
tail -n +2 DATA.xcnv | cut -f1 | sort | uniq | head >
samples.txt
for sample in `cat samples.txt`; do touch $sample.txt; done
ls -lh Sample*
for sample in `cat samples.txt`; do
mv $sample.txt $sample.csv;
done

Variables (i)
i=1
name=Leandro
count=`wc -l adhd.map`
echo $i
echo $name
echo $count

Variables (ii)
# Examples
bwa=/home/users/llima/tools/bwa
hg19=/references/hg19.fasta
# Do not run
$bwa index $hg19

System variables
echo $HOME
echo $USER
echo $PWD
# directory where bash looks for your programs
echo $PATH

Exercise
1. Create a program that shows input
parameters/arguments
2. Create a program (say, "fields", or
"colnames") that prints the column names of
a <tab>-delimited file (example: DATA.xcnv)
3. Send this program to your PATH

Running a bash script (i)
cat > arguments.sh
echo Your program is $0
echo Your first argument is $1
echo Your second argument is $2
echo You entered $# parameters.
# Ctrl+C to exit "cat"

Running a bash script (ii)
bash arguments.sh
bash arguments.sh A B C D E

ls -lh arguments.sh
-rw-r--r--
# First character
b Block special file.
c Character special file.
d Directory.
l Symbolic link.
s Socket link.
p FIFO.
- Regular file.
chmod - set permissions (i)

Next characters
user, group, others | read, write, execute
ls -lh arguments.sh
-rw-r--r--
# Everybody can read
# Only user can write/modify
chmod - set permissions (ii)

# Add writing permission to group
chmod g+w arguments.sh
ls -lh arguments.sh
# Remove writing permission from group
chmod g-w arguments.sh
ls -lh arguments.sh
# Add execution permission to all
chmod a+x arguments.sh
ls -lh arguments.sh
chmod - set permissions (iii)

# Add writing permission to group
./arguments.sh
./arguments.sh A B C D E
# change the name
mv arguments.sh arguments
# Send to your PATH (showing on Mac)
sudo cp arguments /usr/local/bin/
# Go to other directory
# Type argu<Tab>, and "which arguments"
Run your program again

Workshop on command line tools - day 2

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Workshop on command line tools - day 2

Similaire à Workshop on command line tools - day 2 (20)

Plus de Leandro Lima

Plus de Leandro Lima (7)

Dernier

Dernier (20)

Workshop on command line tools - day 2