4. Parsing
Parse a file
Sort its words alphabetically
Sort its words by number of occurences
Bioinformatics master course, ‘11/’12
Paolo Marcatili
5. Perl Basics
Bioinformatics master course, ‘11/’12
Paolo Marcatili
6. PERL
Practical Extraction
and Reporting Language
ü Handle text files
ü Web (CGI)
ü Small scripts
http://www.perltutorial.org/
Bioinformatics master course, ‘11/’12
Paolo Marcatili
7. Install
Windows
http://www.activestate.com/activeperl/
Cygwin (linux emulation)
Linux / OS-X
Native
Bioinformatics master course, ‘11/’12
Paolo Marcatili
8. Hello World!
Bioinformatics master course, ‘11/’12
Paolo Marcatili
9. First script
Open an editor (e.g. gedit)
#!/usr/bin/perl -w
use strict;
use warnings;
print "Hello World!n";
Save as -> first.pl
Bioinformatics master course, ‘11/’12
Paolo Marcatili
10. How to run a script
Terminal -> move to the script folder
perl first.pl
or
chmod a+x first.pl <- now it is executable by
everyone
./first.pl <- ./ means ‘in this folder’
Bioinformatics master course, ‘11/’12
Paolo Marcatili
17. Scalars - 2
ü Scalar data can be number or string.
ü In Perl, string and number can be used "
nearly interchangeable."
ü Scalar variable is used to hold scalar data.
ü Scalar variable starts with dollar sign ($) "
followed by Perl identifier.
ü Perl identifier can contain "
alphanumeric and underscores.
ü It is not allowed to start with a digit.
Bioinformatics master course, ‘11/’12
Paolo Marcatili
18. Examples
#floating-point values
my $x = 3.14;
my $y = -2.78;
#integer values
my $a = 1000;
my $b = -2000;
my $s = "2000"; # similar to $s = 2000;
#strings
my $str = "this is a string in Perl".
my $str2 = 'this is also as string too'.
Bioinformatics master course, ‘11/’12
Paolo Marcatili
19. Operations
my $x = 5 + 9; # Add 5 and 9, and then store the result in $x
$x = 30 - 4; # Subtract 4 from 30 and then store the result in $x
$x = 3 * 7; # Multiply 3 and 7 and then store the result in $x
$x = 6 / 2; # Divide 6 by 2
$x = 2 ** 8; # two to the power of 8
$x = 3 % 2; # Remainder of 3 divided by 2
$x++; # Increase $x by 1
$x--; # Decrease $x by 1
my $y = $x; # Assign $x to $y
$x += $y; # Add $y to $x
$x -= $y; # Subtract $y from $x
$x .= $y; # Append $y onto $x
Bioinformatics master course, ‘11/’12
Paolo Marcatili
20. Operations - 2
my $x = 3;
my $c = "he ";
my $s = $c x $x; # $c repeated $x times
my $b = "bye";
print $s . "n"; #print s and start a new line
# similar to
print "$sn";
my $a = $s . $b; # Concatenate $s and $b
print $a;
# Interpolation
my $x = 10;
my $s = "you get $x";
print $s;
Bioinformatics master course, ‘11/’12
Paolo Marcatili
21. Type Casting
(or
data
conversion,
or
coercion)
is
usually
silent
in
perl
my $x = “3”;
print $x + 4 .”n”;
Be careful!!
my $x = "3";
my $y = 1;
my $z = "uno";
print $x + $y."n";
print $x + $z."n";
print $x + 4 . 1 ."n";
print $x + 4.1 ."n";
Bioinformatics master course, ‘11/’12
Paolo Marcatili
25. array - 2
my @str_array=("Perl","array","tutorial");
my @num_array=(5,7,9,10);
my @mixed_array=(5,7,9,"Perl","list");
my @rg_array=(1..20);
my @empty_array=();
print $str_array[1]; # 1st element is [0]
Bioinformatics master course, ‘11/’12
Paolo Marcatili
26. operations
my @int =(1,3,5,2);
push(@int,10); #add 10 to @int
print "@intn";
my $last = pop(@int); #remove 10 from @int
print "@intn";
unshift(@int,0); #add 0 to @int
print "@intn";
my $start = shift(@int); # add 0 to @int
print "@intn";
Bioinformatics master course, ‘11/’12
Paolo Marcatili
27. on array
my @int =(1,3,5,2);
foreach my $element (@int){
print “element is $elementn”;
}
my @sorted=sort(@int);
foreach my $element (@sorted){
print “element is $elementn”;
}
Bioinformatics master course, ‘11/’12
Paolo Marcatili
28. Hashes
Bioinformatics master course, ‘11/’12
Paolo Marcatili
29. Hashes
• Hashes are like array, they store collections of scalars"
... but unlike arrays, indexing is by name (just like in
real life!!!)"
• Two components to each hash entry:
– Key
example : name
– Value
example : phone number
• Hashes denoted with %
– Example : %phoneDirectory
• Elements are accessed using {} (like [] in arrays)
Bioinformatics master course, ‘11/’12
Paolo Marcatili
30. Hashes continued ...
• Adding a new key-value pair
$phoneDirectory{“Shirly”} = 7267975
– Note the $ to specify “scalar” context!
• Each key can have only one value
$phoneDirectory{“Shirly”} = 7265797
# overwrites previous assignment
• Multiple keys can have the same value
• Accessing the value of a key
$phoneNumber =$phoneDirectory{“Shirly”};
Bioinformatics master course, ‘11/’12
Paolo Marcatili
31. Hashes and Foreach
• Foreach works in hashes as well!
foreach $person (keys (%phoneDirectory) )
{
print “$person: $phoneDirectory{$person}”;
}
• Never depend on the order you put key/values
in the hash! Perl has its own magic to make
hashes amazingly fast!!
Bioinformatics master course, ‘11/’12
Paolo Marcatili
32. Hashes and Sorting
• The sort function works with hashes as well
• Sorting on the keys
foreach $person (sort keys %phoneDirectory) {
print “$person : $directory{$person}n”;
}
– This will print the phoneDirectory hash table in
alphabetical order based on the name of the person,
i.e. the key.
Bioinformatics master course, ‘11/’12
Paolo Marcatili
33. Hash and Sorting cont...
• Sorting by value
foreach $person (sort {$phoneDirectory{$a} <=>
$phoneDirectory{$b}} keys %phoneDirectory)
{
print “$person :
$phoneDirectory{$person}n”;
}
– Prints the person and their phone number in the
order of their respective phone numbers, i.e. the
value.
Bioinformatics master course, ‘11/’12
Paolo Marcatili
34. Exercise
• Chose your own test or use wget "
• Identify the 10 most frequent words
• Sort the words alphabetically"
• Sort the words by the number of
occurrences
Bioinformatics master course, ‘11/’12
Paolo Marcatili
35. Counting Words
my %seen;
my $l=“Lorem ipsum”;
my @w=split (“ “, $l);# questa è una funzione nuova…
foreach my $word (@w){
$seen{$word}++;
}
print “Sorted by occurrencesn”;
foreach my $word (sort {$seen{$a}<=>$seen{$b}} keys %seen){
print “Word $word N: $seen{$word}n”;
}
print “Sorted alphabeticallyn”;
foreach my $word (sort ( keys %seen)){
print “Word $word N: $seen{$word}n”;
}
Bioinformatics master course, ‘11/’12
Paolo Marcatili
36. Homeworks
Download the “Divina commedia”
(wget
http://www.gutenberg.org/cache/epub/1000/pg1000.txt )
For each word length, count the number of occurences (e.g.
123456 words of length 2, etc.)
Length of a string : length($a)
Bioinformatics master course, ‘11/’12
Paolo Marcatili
37. Modalità
di
esame:
Difficoltà:
febbraio
<
giugno
<
seBembre
Per
fare
l’esame
è
NECESSARIO
avermi
mandato
tuM
i
compi6
e
una
esercitazione
Bioinformatics master course, ‘11/’12
Paolo Marcatili