SlideShare une entreprise Scribd logo
1  sur  88
Télécharger pour lire hors ligne
BioPerl and
 Comparative Genomics



• BioPerl - toolkit for Bioinformatics data
• Fungal comparative genomics
Overview of Toolkit

Bioperl is...

   A Set of Perl modules for manipulating
   genomic and other biological data

   An Open Source Toolkit with many
   contributors

   A flexible and extensible system for doing
   bioinformatics data manipulation
Some things you can do

 Read in sequence data from a file in standard
 formats (FASTA, GenBank, EMBL,
 SwissProt,...)

 Manipulate sequences, reverse complement,
 translate coding DNA sequence to protein.

 Parse a BLAST report, get access to every
 bit of data in the report
Major Domains covered
      in Bioperl
 Sequences, Features, Annotations,

 Pairwise alignment reports

 Multiple Sequence Alignments

 Bibliographic data

 Graphical Rendering of sequence tracks

 Database for features and sequences
Additional things
Gene prediction parsers

Trees, Parsing Phylogenetic and Molecular
Evolution software output

Population Genetic data and summary
statistics

Taxonomy

Protein Structure
Practical Examples


Manipulate a DNA or Protein Sequence

Read and write different Sequence formats

Extract sequence annotations and features

Parse a BLAST report
Orienting to the code
http://cvs.open-bio.org

bioperl-live - Core packages

bioperl-run - for running applications

bioperl-ext - C language extension

bioperl-db - bioperl BioSQL implementation

bioperl-pedigree, bioperl-microarray are
side-projects
Website

http://bioperl.org or http://bio.perl.org/

HOWTOs

Frequently Asked Questions (FAQ)

News

Links to online Documentation
Anatomy of a Bioperl
      Module

perldoc Module -- perldoc Bio::SeqIO

SYNOPSIS -- runnable code

DESCRIPTION -- summary about the module

Each module will have annotated functions
A Tour: Core Objects


Bioperl Sequences, Features, Locations,
Annotations

Sequence searching & pairwise alignments

Multiple Sequence Alignments
Sequences and Features
TSS Feature
                          PolyA site



          Exon Features                Sequence
      Genomic Sequence with
      3 exons
      1 Transcript Start Site (TSS)
      1 Poly-A Site
Sequences, Features, and
      Annotations
  Sequence - DNA, RNA, AA

    Feature container

  Feature - Information with a Sequence
  Location

  Annotation - Information without explicit
  Sequence location
Parsing Sequences
Bio::SeqIO

  multiple drivers: genbank, embl, fasta,...

Sequence objects

  Bio::PrimarySeq

  Bio::Seq

  Bio::Seq::RichSeq
Look at the Sequence
       object
Common (Bio::PrimarySeq) methods

  seq() - get the sequence as a string
  length() - get the sequence length
  subseq($s,$e) - get a subseqeunce
  translate(...) - translate to protein [DNA]
  revcom() - reverse complement [DNA]
  display_id() - identifier string
  description() - description string
Using a Sequence
my $str = “ATGAATGATGAA”;
my $seq = Bio::PrimarySeq->new(-seq => $str,
	

 	

 	

 	

 	

 	

 	

 	

 	

 	

 	

 	

 -display_id=”example”);

print “id is “, $seq-display_id,”
”;
print $seq-seq, “
”;
my $revcom = $seq-revcom;
print $revcom-seq, “
”;
print “frame1=”,$seq-translate-seq,“
”;

id is example
ATGAATGATGAA
TTCATCATTCAT
trans frame1=MNDE
Sequence Features

Bio::SeqFeatureI - interface - GFF derived

  start(), end(), strand() for location information
  location() - Bio::LocationI object (to represent
  complex locations)
  score,frame,primary_tag, source_tag - feature
  information
  spliced_seq() - for attached sequence, get the
  sequence spliced.
http://www.sanger.ac.uk/Software/formats/
           GFF/GFF_Spec.shtml
has-a
                               Bio::Annotation::Collection
           Bio::Seq

                                                    Annotations
Features                                    has-a
               has-a


 Bio::SeqFeature::Generic       Bio::Annotation::Comment
               has-a


      Bio::LocationI
Reading and Writing
    Sequences
Bio::SeqIO

  fasta, genbank, embl, swissprot,...

Takes care of writing out associated features
and annotations

Two functions

  next_seq (reading sequences)

  write_seq (writing sequences)
Writing a Sequence

use Bio::SeqIO;
# Let’s convert swissprot to fasta format
my $in = Bio::SeqIO-new(-format = ‘swiss’,
                          -file    = ‘file.sp’);
my $out = Bio::SeqIO-new(-format = ‘fasta’,
                          -file    = ‘file.fa’);`
while( my $seq = $in-next_seq ) {
  $out-write_seq($seq);
}
v         Mon Nov 17 12:38:43 2003       1
         opt      E()
 20     323     0:==
  22       0     0:           one = represents 184 library sequences
  24       2     0:=
  26      12     2:*
  28      61    26:*
  30     211   157:*=
  32     664   607:===*
  34    1779 1645:========*=
  36    3558 3379:==================*=
  38    5908 5584:==============================*==
  40    8049 7790:==========================================*=
  42   10001 9522:===================================================*===
  44   10660 10503:=========================================================*
  46   10987 10698:==========================================================*=
  48   10332 10242:=======================================================*=
  50    9053 9346:==================================================*
  52    7736 8217:=========================================== *
  54    6828 7018:======================================*
  56    5448 5863:============================== *
  58    4484 4813:========================= *
  60    3818 3899:=====================*




                           Sequence Database
  62    2942 3126:================*
  64    2407 2486:=============*
  66    1866 1965:==========*
  68    1495 1545:========*
  70    1169 1211:======*
  72     886   946:=====*
  74     708   738:====*
  76     542   574:===*




                               Searching
  78     451   446:==*
  80     355   347:=*
  82     271   265:=*
  84     211   210:=*
  86     151   163:*
  88     104   126:*          inset = represents 3 library sequences
  90     101    97:*
  92      78    75:*         :========================*=
  94      56    58:*         :===================*
  96      38    45:*         :============= *
  98      26    35:*         :========= *
 100      26    27:*         :========*
 102      20    21:*         :======*
 104      13    16:*         :=====*
 106      22    12:*         :===*====
 108      10    10:*         :===*
 110       5     7:*         :==*
 112       4     6:*         :=*
 114       4     4:*         :=*
 116       3     3:*         :*
 118       9     3:*         :*==
120     110     2:*         :*====================================
A Detailed look at
   BLAST parsing

3 Components

  Result: Bio::Search::Result::ResultI

  Hit: Bio::Search::Hit::HitI

  HSP: Bio::Search::HSP::HSPI
use Bio::SearchIO;
my $cutoff = ’0.001’;
my $file = ‘BOSS_Ce.BLASTP’,
my $in = new Bio::SearchIO(-format = ‘blast’,
                            -file    = $file);
while( my $r = $in-next_result ) {
  print Query is: , $r-query_name,  ,
  $r-query_description, ,$r-query_length, aa
;
  print  Matrix was , $r-get_parameter(’matrix’), 
;
  while( my $h = $r-next_hit ) {
    last if $h-significance  $cutoff;
    print Hit is , $h-name, 
;
    while( my $hsp = $h-next_hsp ) {
      print  HSP Len is , $hsp-length(’total’),  ,
             E-value is , $hsp-evalue,  Bit score ,
            $hsp-score,  
,
             Query loc: ,$hsp-query-start,  ,
            $hsp-query-end, ,
             Sbject loc: ,$hsp-hit-start,  ,
            $hsp-hit-end,
;
    }
  }
}
BLAST Report
Copyright (C) 1996-2000 Washington University, Saint Louis, Missouri USA.
All Rights Reserved.

Reference:   Gish, W. (1996-2000) http://blast.wustl.edu

Query=   BOSS_DROME Bride of sevenless protein precursor.
         (896 letters)

Database:  wormpep87
           20,881 sequences; 9,238,759 total letters.
Searching....10....20....30....40....50....60....70....80....90....100% done

                                                                       Smallest
                                                                         Sum
                                                               High   Probability
Sequences producing High-scoring Segment Pairs:               Score   P(N)      N

F35H10.10 CE24945   status:Partially_confirmed TR:Q20073...     182 4.9e-11     1
M02H5.2 CE25951   status:Predicted TR:Q966H5 protein_id:...     86 0.15        1
ZC506.4 CE01682 locus:mgl-1 metatrophic glutamate recept...     91 0.18        1
F23D12.2 CE05700   status:Partially_confirmed TR:Q19761 ...      73 0.45        3

F35H10.10 CE24945    status:Partially_confirmed TR:Q20073
           protein_id:AAA81683.2
        Length = 1404

 Score = 182 (69.1 bits), Expect = 4.9e-11, P = 4.9e-11
 Identities = 75/315 (23%), Positives = 149/315 (47%)

Query:    511 YPFLFDGESVMFWRIKMDTWVATGLTAAILGLIATLAILVFIVVRISLGDVFEGNPTTSI 570
              Y +F+ +     WR     +V   L   ++ + +A+LV ++V++ L V +GN + I
Sbjct:   1006 YQSVFEHITTGHWRDHPHNYVLLALITVLV--VVAIAVLVLVLVKLYLR-VVKGNQSLGI 1062
BLAST Script Results

Query is: BOSS_DROME Bride of sevenless protein
precursor. 896 aa
Matrix was BLOSUM62
Hit is F35H10.10
HSP Len is 315 E-value is 4.9e-11 Bit score 182
Query loc: 511 813 Sbject loc: 1006 1298
HSP Len is 28 E-value is 1.4e-09 Bit score 39
Query loc: 508 535 Sbject loc: 427 454
FASTA Parsing Script
use Bio::SearchIO;
my $cutoff = ’0.001;
my $file = ‘BOSS_Ce.FASTP’,
my $in = new Bio::SearchIO(-format = ‘fasta’,
                            -file    = $file);
while( my $r = $in-next_result ) {
  print Query is: , $r-query_name,  ,
  $r-query_description, ,$r-query_length, aa
;
  print  Matrix was , $r-get_parameter(’matrix’), 
;
  while( my $h = $r-next_hit ) {
    last if $h-significance  $cutoff;
    print Hit is , $h-name, 
;
    while( my $hsp = $h-next_hsp ) {
      print  HSP Len is , $hsp-length(’total’),  ,
             E-value is , $hsp-evalue,  Bit score ,
            $hsp-score,  
,
             Query loc: ,$hsp-query-start,  ,
            $hsp-query-end, ,
             Sbject loc: ,$hsp-hit-start,  ,
            $hsp-hit-end,
;
    }
  }
}
FASTA Report
 FASTA searches a protein or DNA sequence data bank
 version 3.4t20 Sep 26, 2002
Please cite:
 W.R. Pearson  D.J. Lipman PNAS (1988) 85:2444-2448

Query library BOSS_DROME.aa vs /blast/wormpep87 library
searching /blast/wormpep87 library

  1BOSS_DROME Bride of sevenless protein precursor. - 896 aa
 vs /blast/wormpep87 library
9238759 residues in 20881 sequences
  Expectation_n fit: rho(ln(x))= 5.6098+/-0.000519; mu= 12.8177+/- 0.030
 mean_var=107.8223+/-22.869, 0's: 0 Z-trim: 2 B-trim: 0 in 0/62
 Lambda= 0.123515
 Kolmogorov-Smirnov statistic: 0.0333 (N=29) at 48

FASTA (3.45 Mar 2002) function [optimized, BL50 matrix (15:-5)] ktup: 2
 join: 38, opt: 26, gap-pen: -12/-2, width: 16
 Scan time: 9.680
The best scores are:                                       opt bits E(20881)
F35H10.10 CE24945     status:Partially_confirmed T (1404) 207 48.5 6.8e-05
T06E4.11 CE06377 locus:pqn-63 status:Predicted     ( 275) 122 32.6       0.8
C33B4.3 CE01508   ankyrin and proline rich domain (1110) 124 33.6        1.6
Y48C3A.8 CE22141     status:Predicted TR:Q9NAG3 pr ( 291) 110 30.5       3.7
Y34D9A.2 CE30217     status:Partially_confirmed TR ( 326) 108 30.2       5.1
K06H7.3 CE26941   Isopentenyl-diphosphate delta i ( 618) 107 30.3        8.9
F44B9.8 CE29044   ARPA status:Partially_confirmed ( 388) 104 29.5        9.4

F35H10.10 CE24945    status:Partially_confirmed TR:Q20 (1404 aa)
 initn: 94 init1: 94 opt: 207 Z-score: 197.9 bits: 48.5 E(): 6.8e-05
Smith-Waterman score: 275; 22.527% identity (27.152% ungapped) in 728 aa
overlap (207-847:640-1330)

        180       190       200       210       220          230
BOSS_D RAISIDNASLAENLLIQEVQFLQQCTTYSMGIFVDWELYKQLESVIKD---LEYNIWPIP
FASTA Script Results

Query is: BOSS_DROME Bride of sevenless protein
precursor. 896 aa
Matrix was BL50
Hit is F35H10.10
HSP Len is 728 E-value is 6.8e-05 Bit score 197.9
Query loc: 207 847 Sbject loc: 640 1330
Using the Search::Result
          object
use Bio::SearchIO;
use strict;
my $parser = new Bio::SearchIO(-format = ‘blast’,
                               -file = ‘file.bls’);
while( my $result = $parser-next_result ){
  print “query name=“, $result-query_name, “ desc=”,
        $result-query_description, “, len=”,$result-
query_length,“
”;
  print “algorithm=“, $result-algorithm, “
”;
  print “db name=”, $result-database_name, “ #lets=”,
  $result-database_letters, “ #seqs=”,$result-
database_entries, “
”;
  print “available params “, join(’,’,
        $result-available_parameters),”
”;
  print “available stats “, join(’,’,
        $result-available_statistics), “
”;
  print “num of hits “, $result-num_hits, “
”;
}
Using the Search::Hit
             Object
use Bio::SearchIO;
use strict;
my $parser = new Bio::SearchIO(-format = ‘blast’,
                                -file = ‘file.bls’);
while( my $result = $parser-next_result ){
  while( my $hit = $result-next_hit ) {
    print “hit name=”,$hit-name, “ desc=”, $hit-description,
          “
 len=”, $hit-length, “ acc=”, $hit-accession,
”
”;
    print “raw score “, $hit-raw_score, “ bits “, $hit-bits,
          “ significance/evalue=“, $hit-evalue, “
”;
  }
}
Cool Hit Methods

start(), end() - get overall alignment start
and end for all HSPs

strand() - get best overall alignment strand

matches() - get total number of matches
across entire set of HSPs (can specify only
exact ‘id’ or conservative ‘cons’)
Using the Search::HSP
               Object
use Bio::SearchIO;
use strict;
my $parser = new Bio::SearchIO(-format = ‘blast’, -file = ‘file.bls’);
while( my $result = $parser-next_result ){
  while( my $hit = $result-next_hit ) {
     while( my $hsp = $hit-next_hsp ) {
       print “hsp evalue=“, $hsp-evalue, “ score=“ $hsp-score, “
”;
       print “total length=“, $hsp-hsp_length, “ qlen=”,
             $hsp-query-length, “ hlen=”,$hsp-hit-length, “
”;
       print “qstart=”,$hsp-query-start, “ qend=”,$hsp-query-end,
             “ qstrand=“, $hsp-query-strand, “
”;
       print “hstart=”,$hsp-hit-start, “ hend=”,$hsp-hit-end,
             “ hstrand=“, $hsp-hit-strand, “
”;
       print “percent identical “, $hsp-percent_identity,
             “ frac conserved “, $hsp-frac_conserved(), “
”;
       print “num query gaps “, $hsp-gaps(’query’), “
”;
       print “hit str =”, $hsp-hit_string, “
”;
       print “query str =”, $hsp-query_string, “
”;
       print “homolog str=”, $hsp-homology_string, “
”;
  }
}
Cool HSP methods

rank() - order in the alignment (which you
could have requested, by score, size)

matches - overall number of matches

seq_inds - get a list of numbers
representing residue positions which are

  conserved, identical, mismatches, gaps
SearchIO system

BLAST (WU-BLAST, NCBI, XML, PSIBLAST,
BL2SEQ, MEGABLAST, TABULAR (-m8/m9))

FASTA (m9 and m0)

HMMER (hmmpfam, hmmsearch)

UCSC formats (WABA, AXT, PSL)

Gene based alignments

  Exonerate, SIM4, {Gene,Genome}wise
SearchIO reformatting


 Supports output of Search reports as well

 Bio::SearchIO::Writer

   “BLAST flavor” HTML, Text

   Tabular Report Format
Bioperl Reformatted HTML of BLASTP Search Report
                        for gi|6319512|ref|NP_009594.1|
      BLASTP 2.0MP-WashU [04-Feb-2003] [linux24-i686-ILP32F64 2003-02-04T19:05:09]
      Copyright (C) 1996-2000 Washington University, Saint Louis, Missouri USA.
      All Rights Reserved.
      Reference: Gish, W. (1996-2000) http://blast.wustl.edu
                                                              Hit Overview
                                  Score: Red= (=200), Purple 200-80, Green 80-50, Blue 50-40, Black 40

 Build custom
   overview
(Bio::Graphics)
      Query= gi|6319512|ref|NP_009594.1| chitin synthase 2; Chs2p [Saccharomyces cerevisiae]
           (963 letters)
      Database: cneoA_WI.aa
           9,645 sequences; 2,832,832 total letters
                                                      Hyperlink to external
                                                                                     Score    E
                                                             resources
                           Sequences producing significant alignments:
                                                                                     (bits) value
       cneo_WIH99_157.Gene2 Start=295 End=4301 Strand=1 Length=912 ExonCt=24         1650 1.6e-173
       cneo_WIH99_63.Gene181 Start=154896 End=151527 Strand=-1 Length=876 ExonCt=13 1441 3.9e-149
                                                                                                            Hyperlink to
       cneo_WIH99_133.Gene1 Start=15489 End=19943 Strand=1 Length=1017 ExonCt=23     1357 3e-142
                                                                                                           alignment part
       cneo_WIH99_45.Gene2 Start=84 End=3840 Strand=1 Length=839 ExonCt=25           1311 1.5e-138
                                                                                                              of report
       cneo_WIH99_112.Gene165 Start=122440 End=118921 Strand=-1 Length=1036 ExonCt=9 198 1.2e-15
       cneo_WIH99_11.Gene7 Start=39355 End=42071 Strand=1 Length=761 ExonCt=9        172 6.4e-13
       cneo_WIH99_60.Gene9 Start=36153 End=32819 Strand=-1 Length=1020 ExonCt=5      166 1.2e-12
       cneo_WIH99_106.Gene88 Start=242538 End=238790 Strand=-1 Length=1224 ExonCt=3 157 6.3e-09
Turning BLAST into
               HTML
use Bio::SearchIO;
use Bio::SearchIO::Writer::HTMLResultWriter;

my $in = new Bio::SearchIO(-format = 'blast',
	

 	

 	

 -file = shift @ARGV);

     my $writer = new
Bio::SearchIO::Writer::HTMLResultWriter();
     my $out = new Bio::SearchIO(-writer = $writer
	

 	

 	

 	

 	

 	

 	

 	

 	

 	

 	

 	

 -file = “file.html”);
     $out-write_result($in-next_result);
Turning BLAST into HTML
# to filter your output
  my $MinLength = 100; # need a variable with scope outside the method
  sub hsp_filter {
      my $hsp = shift;
      return 1 if $hsp-length('total')  $MinLength;
  }
  sub result_filter {
      my $result = shift;
      return $hsp-num_hits  0;
  }

  my $writer = new Bio::SearchIO::Writer::HTMLResultWriter
                     (-filters = { 'HSP' = hsp_filter} );
  my $out = new Bio::SearchIO(-writer = $writer);
  $out-write_result($in-next_result);

  # can also set the filter via the writer object
  $writer-filter('RESULT', result_filter);
Trees

• Support reading of Tree data
 • newick, nexus, nhx formats
• Manipulation of Trees
• Trees are connections of Nodes which have
  ancestor and children pointers


                                              40
Simple Tree Routines
                                             LCA


• Lowest Common Ancestor
• Tests of monophyly, paraphyly




                                  outgroup
• Reroot tree
• Distances between two nodes
• Find a node by name
                                              41
Constructing Trees

• For Purely Bioperl Solution (only in post
  Bioperl 1.4 releases)
• Bio::Align::DNAStatistics and
  ProteinStatistics can build pairwise
  distance matricies
• Bio::Tree::DistanceFactory has UPGMA,
  Neighbor-Joining implemented

                                              42
Interfacing with Phylip

• Bio::Tools::Run::Phylo::Phylip
 • ProtDist (also parser Bio::Matrix::IO)
 • Neighbor, ProtPars, SeqBoot, Consense
 • DrawGram, DrawTree

                                            43
Reading/Writing a Tree

#!/usr/bin/perl -w
use Bio::TreeIO;
use strict;
my $in = Bio::TreeIO-new(-format = ‘nexus’,
                           -file  = ‘trees.nex’);
my $out = Bio::TreeIO-new(-format = ‘newick’,
                            -file  = ‘trees.nh’);
while( my $tree = $in-next_tree ) {
  $out-write_tree($tree);
}




                        44
Fetching subset of nodes
 #!/usr/bin/perl -w
 use strict;
 use Bio::TreeIO;
 my $in = Bio::TreeIO-new(-format = ‘newick’,
                           -fh    = *DATA);
 if( my $tree = $in-next_tree ) {
    my @nodes = $tree-get_nodes;
    my @tips = grep { $_-is_Leaf } @nodes;
    print there are ,scalar @tips,  tips
;
    my @internal = grep { ! $_-is_Leaf } @nodes;
    print there are ,scalar @internal,  internal
 nodes
;
    my ($A_node) = $tree-find_node(-id = 'A');
    print branch length of Ancestor of ,$A_node-id,
     is , $A_node-ancestor-branch_length, 
;
 }
 __DATA__
 (((A:10,B:11):2,C:5),((D:7,F:6):17,G:8));
                         45
Walking up the tree
           (tips to root)
       if( my $tree = $in-next_tree ) {
          my @tips = grep { $_-is_Leaf } $tree-get_nodes;
          for my $node ( @tips ) {
             my @path;
             while( defined $node) {
                push @path, $node-id;
                $node = $node-ancestor;
             }
             print join(“,“, @path), “
”;
          }
       }
       __DATA__
       (((A:10,B:11)I1:2,C:5)I2,((D:7,F:6)I3:17,G:8)I4)Root;
                                                      A

                                                 I1
C,I2,Root                                             B
                                            I2
A,I1,I2,Root
                                                      C
B,I1,I2,Root
                                     Root
G,I4,Root                                             D

                                                 I3
D,I3,I4,Root                                          F
                                            I4
F,I3,I4,Root                   46
                                                      G
From Alignments to Trees
• Bio::AlignIO to parse the alignment
• Bio::Align::ProteinStatistics to compute
  pairwise distances
• Bio::Tree::DistanceFactory to build a tree
  based on a matrix of distances using NJ or
  UPGMA
• More sophisticated tree building should be
  done with tools like phyml, PAUP,
  MOLPHY, PHYLIP, MrBayes, or PUZZLE
                       47
Testing phylogenetic
       hypotheses
• No sophisticated ML methods are currently
  built in Bioperl for testing for phylogenetic
  correlations, etc
• Can export trees and use tools like
  Mesquite



                      48
Bioperl  Population
      Genetics
• Basic data
 • Marker - polymorphic region of genome
 • Individual - individual sampled
 • Genotype - observed allele(s) for a
    marker in an individual
 • Population - collection of individuals
                                            49
The Players
• Bio::PopGen::Population - container for
  Individuals, calculate summary stats
• Bio::PopGen::Individual - container for
  Genotypes
• Bio::PopGen::Marker - summary info
  about a Marker (primers, genome loc,
  allele freq)
• Bio::PopGen::Genotype - pairing of
  Individual  Marker; allele container     50
Population Genetics
   Data Objects
  Population
        Has
                            Map
  Individual
                     Has
        Has
                           Marker
  Genotype
               Has

                                    51
Some Data Formats
• PrettyBase format (Seattle SNP data)
  MARKER      SAMPLEID   ALLELE1 ALLELE2
  ProcR2973EA      01      C       T
  ProcR2973EA      02      N       N
  ProcR2973EA      03      C       C
  ProcR2973EA      04      C       T
  ProcR2973EA      05      C       C



• Comma Delimited (CSV)
  SAMPLE,MARKERNAME1,MARKERNAME2,...
  SAMPLE,ProcR2973EA,Marker2
  sample-01,C T,G T
  sample-02,C C,G G



• Phase format
  3
  5
  P 300 1313 1500 2023 5635
  MSSSM
  #1
  12 1 0 1 3
                                           52
  11 0 1 0 3
Reading in Data

use Bio::PopGen::IO;
my $in = Bio::PopGen::IO-new(-format = ‘csv’
                         -file   = ‘dat.csv’);
my @pop;
while( my $ind = $in-next_individual ) {
  push @pop, $ind;
}
# OR
my $pop = $in-next_population;




                                                  53
Ready Built Stuff
• Bio::PopGen::PopStats - population level
  statistics (only FST currently)

• Bio::PopGen::Statistics - suite of
  Population Genetics statistical tests and
  summary stats.
• Bio::PopGen::Simulation::Coalescent -
  primitive Coalescent simulation
  • Basic tree topology and branch length
    assignment.                               54
Using the Modules
use Bio::PopGen::Statistics;
my $stats = Bio::PopGen::Statistics-new();
my $pi = $stats-pi($population);
# or use an array reference of Individuals
my $pi = $stats-pi(@individuals);
# Tajima’s D
my $TajimaD = $stats-tajima_D($population);
# Fu and Li’s D
my $FuLiD = $stats-fu_and_li_D($ingroup_pop,
                                $outgroup_ind);
# Fu and Li’s D*
my $FLDstar = $stats-fu_and_li_D_star($population);

# pairwise composite LD
my %LDstats = $stats-composite_LD($population);
my $LDarray = $LDstats{’marker1’}-{’marker2’};
my ($ldval,$chisq) = @$LDarray;

                                                       55
Getting Data from
      Alignments
• use Bio::AlignIO to read in Multiple
  Sequence Alignment data
• Bio::PopGen::Utilities aln_to_population
  will build Population from MSA
  • Will make a “Marker” for every
    polymorphic site (or if asked every site)
  • Eventually will have ability to only get
    silent/non-silent coding sites
                                                56
Simulations

• Simulation::Coalescent (Hudson 1990)
 •   Generate random topology

 •   Assign mutations to branches

 •   Tree tips are AlleleNodes (Tree::Node 
     PopGen::Individual)



                                               57
Coalescent Simulation
use Bio::PopGen::Simulation::Coalescent;
my $factory = new
Bio::PopGen::Simulation::Coalescent(
 -sample_size = 10,
 -maxcount    = 100); # max number trees

my $tree = $factory-next_tree;
$factory-add_Mutations($tree,20);
my @inds = $tree-get_leaf_nodes;

use Bio::PopGen::Statistics;
my $pi = Bio::PopGen::Statistics-pi(@inds);
my $D = Bio::PopGen::Statistics- tajima_D
(@inds);

                                                58
Scripts
scripts/popgen/composite_LD
-i filename.prettybase
[-o outfile.LD]
[--format prettybase/csv]
[--sortbyld]
[--noconvertindels]
ProcR2973EA   01   C   T
ProcR2973EA   02   N   N
ProcR2973EA   03   C   C
ProcR2973EA   04   C   T
ProcR2973EA   05   C   C
ProcR2973EA   06   C   T
ProcR2973EA   07   C   C

ProC9198EA,ProcR2973EA - LD=-0.0375, chisq=2.56
                                                  59
Simple pairwise Ka, Ks
   rates estimation
• Within Bioperl can calculate pairwise K ,K
                                         a   s
  and Ka/Ks using Nei’s method.
• Bio::Align::DNAStatistics
• calc_KaKs_pair($alnobj,$seq1id,$seq2id)
  calc_all_KaKs_pairs($alnobj)



                     60
Ortholog/Homolog
       finding
• Doesn’t really require much of Bioperl to
  do this
• Want to find best reciprocal BLAST/
  FASTA/SSEARCH hits
• Build up a matrix, where each query
  sequence has a list of its best hits


                      61
my $reportfile = ‘/usr/local/bio/data/reports/fly_worm.FASTA.m9.gz’;
open(IN,”zcat $reportfile |”) || die $!;
my %data;
while(IN) {
    my @line = split;
    my ($query,$subject,$pid,$alnlen) = @line;
    my ($evalue) = $line[10];
    next if $query eq $subject;
    push @{$data{$query}}, [$subject, $evalue,$pid,$alnlen];
}
my %seen; my %orthologs; # some tracking and data vars
for my $q ( keys %data ) {
  next if $seen{$q}++; # skip if we’ve processed it already
  my ($s, $evalue,$pid,$alnlen) = @{$data{$q}-[0]};
  my ($qsp) = split(/:/,$q); my ($ssp) = split(/:/,$s);
  $seen{$s}++;
  next if( $qsp eq $ssp || $pid  30 ||
       (defined $data{$q}-[1]  1e-10 
       abs($data{$q}-[1]-[1] - $evalue) ) ); # second best hit
                                               # is too good
  if( $data{$s}-[0]-[0] eq $q ) {
     my $id = join(”-”, sort ($qsp,$ssp));
     push @{$orthologs{$id}}, [$q,$s,$evalue,$pid,$alnlen];
  }
}
                                 62
# print out the ortholog pairs, do some sorting to insure that
# order of names is consistent and order by evalue


for my $id ( keys %orthologs ) {
  open(OUT, “$id.orthologs”) || die $!;
  for my $pair ( sort {$a-[2] = $b-[2] }
                 @{$orthologs{$id}} ) {
     my ($q,$s,$evalue,$pid,$alnlen) = @$pair;
     ($q,$s) = sort ( $q, $s ); # assume prefixed
     print OUT join(”	”, ($q,$s,$evalue,$pid,$alnlen)), “
”;
  }
}




                             63
Gene families
• Families can be constructed with single-
  linkage - merging all homologs into a single
  cluster
• Or using a tool like MCL (http://micans.org/
  mcl) to build gene families from a matrix of
  pairwise distances (BLAST or FASTA
  scores)


                     64
Automating PAML

• PAML - phylogenetic analysis with
  maximum likelihood
• Estimate synonymous and non-synonymous
  substitution rates
• Along branches of a tree or in a pairwise
  fashion


                       65
Preparing Data
• Multiple sequence alignments of protein
  coding sequence
• No stop codons!
• Must be aligned on codon boundaries
• Easiest way is to align at protein level, then
  project back into CDS alignment


                       66
Doing Protein
         Alignments
• Bio::Tools::Run::Alignment::Clustalw or
  Bio::Tools::Run::Alignment::MUSCLE or just
  prepare the sequence files and run the
  alignment programs via scripts
• Bio::AlignIO to parse the alignment data
• Bio::Align::Utilities to project back into
  CDS space

                      67
Build tree or assume a
          tree
• If doing analysis of genomes which have a
  know species tree - use that tree
• Branch lengths are not part of PAML.
  Multiple topologies can be provided to test
  alternative hypotheses (by comparing
  maximum likelihood values)


                     68
Running PAML
#!/usr/bin/perl -w
use strict;
use Bio::Tools::Run::Phylo::PAML::Codeml;
use Bio::AlignIO;
my $factory = Bio::Tools::Run::Phylo::PAML::Codeml-new(
              -params = { ‘runmode’ = -2,
                           ‘seqtype’ = 1});
my $alnio = Bio::AlignIO-new(-format = ‘clustalw’,
                              -file   = ‘cds.aln’);
my $aln = $alnio-next_aln; # get the alignment from file
$factory-alignment($aln); # set the alignment
my ($returncode,$parser) = $factory-run();
my $result = $parser-next_result;
my $MLmatrix = $result-get_MLMatrix;

print Ka = , $MLmatrix-[0]-[1]-{’dN’},
;
print Ks = , $MLmatrix-[0]-[1]-{’dS’},
;
print Ka/Ks = , $MLmatrix-[0]-[1]-{’omega’},
;




                           69
Parsing PAML
#!/usr/bin/perl -w
use strict;
use Bio::Tools::Phylo::PAML;
my $parser = Bio::Tools::Phylo::PAML-new
  (-file = ‘results/mlc’, -dir = ‘results’);
if( my $result = $parser-next_result ) {
  my @otus = $result-get_seqs;
  # get Nei  Gojobori dN/dS matrix
  my $NGmatrix = $result-get_NGmatrix;
  printf “%s and %s dS=%.4f dN=%.4f Omega=%.4f
”,
    $otus[0]-display_id, $otus[1]-display_id,
    $NGMatrix-[0]-[1]-{dS}, $NGMatrix-[0]-[1]-{dN},
    $NGMatrix-[0]-[1]-{omega};
}




                        70
Getting the Trees out
my @trees = $result-get_trees;
for my $tree ( @trees ) {
    print “likelihood is “, $tree-score, “
”;
    # do something else with the trees,
    # for non runmode -2 results
# inspect the tree, branch specific rates
# the t (time) parameter is available via
# (omega, dN, etc.) are available via
# ($omega) = $node-get_tag_values('omega');
for my $node ( $tree-get_nodes ) {
    print $node-id, “ t=”,$node-branch_length,
     “ omega “, $node-get_tag_values(‘omega’);
  }
}




                         71
Summary

Lots of modules to do lots of things

How to find out what exists?

Read HOWTOs, bptutorial, Browse the docs
website - http://doc.bioperl.org/

Ask on-list bioperl-l@bioperl.org
Fungal Genomics

• ~45 fungal genomes currently
  sequenced
• 5-10 more in production at sequencing
  centers
• Diverse sampling across the fungal tree
 • Clusters of closely related species
 • Cryptococcus, Aspergillus,
   Saccharomyces, Candida
Genome Annotation
                      Pipeline
               Rfam
             tRNAscan

                            ZFF to GFF3                     GFF to AA
              SNAP                                                                      FASTA
                                                                          predicted
                                                                                                         Proteins
                                                                           proteins
             Twinscan                                                                  all-vs-all
                            GFF2 to GFF3




                                             Bio::DB::GFF
                            Tools::Glimmer
             Glimmer                                                                         SearchIO
                                                             protein to
             Genscan
Genome                                                        genome
                            Tools::Genscan
                                                            coordinates
                                                                                       homologs
                                                                           HMMER                        MCL
                              SearchIO                       SearchIO                     and
             BLASTZ
                                                                                       orthologs
             BLASTN
                                                             Tools::GFF     GLEAN
                            GFF2 to GFF3
            exonerate                                                     (combiner)
Proteins                                                                                                 Gene
           protein2genome
                                                                                                        families



            exonerate
 ESTs
            est2genome


                                                                                                    BOSC 2005
Genome Browsing




                  BOSC 2005
Comparative Genomics

• Find orthologs, align, build trees
 • Assess gene structure among
   orthologs
• Find gene families
 • Assess gene structure
 • Copy number among genomes

                                       BOSC 2005
Comparative Genomics:
          Gene Structure

                                                                                        Species
                                                                                       phylogeny


                                                                                              Bio::Tree/TreeIO
             Bio::DB
                                    Bio::AignIO                  Bio::SimpleAlign
            Bio::SeqIO
                         MUSCLE                   Alignments                          Score shared
orthologs
                                  Bio::Coordinate with Introns                      intorn positions
                                                                                        on Tree




                                                                                                 BOSC 2005
Phylogenomics

                                                                    Species
                                                                   phylogeny




                                                          Bio::TreeIO
   gene
                                                MrBayes
 families    Bio::DB
                                                 PhyML             Classify tree
            Bio::SeqIO            Bio::AignIO
                         MUSCLE                                    differences
                                                ProtML
                                                PUZZLE             Bin genes by
orthologs                                                            topology




                                                                                   BOSC 2005
Arabidopsis thaliana
                                                                                           Dictyostelium discoideum
                                                                                           Rhizopus oryzae
                                                                                           Ustilago maydis
                                             Basidiomycota
                                                                                           Cryptococcus neoformans
                                                                                           Cryptococcus neoformans
                                                                                           Cryptococcus neoformans
                                                                                           Cryptococcus neoformans
                                                                                           Coprinus cinereus
                                                                                           Phanerochaete chrysosporium
                                                                                           Schizosaccharomyces pombe
               Fungi
                                                                                           Sclerotinia sclerotiorum
                         Ascomycota
                                                                                           Botrytis cinerea
                                                                                           Magnaporthe grisea
                                                                    Sordiomycota
                                                                                           Neurospora crassa
                                                                                           Podospora anserina
                                                                                           Chaetomium globosum
                                                                                           Trichoderma reesi
                                                  Euasco
                                                                                           Fusarium verticillioides
                                                                                           Fusarium graminearum
                                                                                           Stagonospora nodorum
                                                                                           Aspergillus terreus
                                                                            Aspgergillus
                                                                                           Aspergillus nidulans
                                                                                           Aspergillus fumigatus
                                                                    Eurotiomycota
                                                                                           Histoplasma capsulatum
                                                                       Onygenales          Coccidioides immitis
                                                                                           Uncinocarpus reesii
                                                                                           Yarrowia lipolytica
                                                                                           Kluyveromyces lactis
                                                                                           Ashbya gossypii
                                         Hemiascomycota
Opistikant                                                                                 Candida glabrata
                                                                                           Saccharomyces castellii
                                                                                           Saccharomyces bayanus
                                                                                           Saccharomyces kudriavzevii
                                                                                           Saccharomyces mikatae
                                                                                           Saccharomyces paradoxus
                                                                                           Saccharomyces cerevisiae
                                                                         Saccharomyces
                                                                         Complex           Saccharomyces cerevisiae
                                                                                           Saccharomyces cerevisiae
                                                                                           Saccharomyces kluyveri
                                                                                           Kluyveromyces waltii
                                                                                           Candida tropicalis
                                                                                           Candida albicans
                                                                                           Candida dubliniensis
                                                                                           Candida guilliermondii
                                                                                           Debaryomyces hansenii
                                                                                           Candida lusitaniae
                                                                                           Caenorhabdtis elegans
                                                                              Nematoda
                                                                                           Caenorhabdtis briggsae
                                                                                           Apis mellifera
             Metazoa
                                              Insecta                                      Drosophila melanogaster
                       Coelomata...                          Diptera
                                                                                           Anopholes gambiae
                                                                                           Ciona intestinalis
                                                                                           Xenopus tropicalis
                                Vertebrate
                                                                                           Danio rerio
                                                             Fish                          Fugu rubripes
                                                                                           Tetraodon nigroviridis
                                                                                           Gallus gallus
                                                                                           Canis familiaris
                                                                            Mammalia        Rattus norvegicus
                                                                                          Rodenta
                                                                                            Mus musculus
                                                                                     PrimateRodent
                                                                                            Homo sapiens
                                                                                         Primate
                                                                                            Pan troglodytes
Rhizopus oryzae
                                                                          Ustilago maydis
                                  Basidiomycota
                                                                          Cryptococcus neoformans
                                                                          Cryptococcus neoformans
                                                                          Cryptococcus neoformans
                                                                          Cryptococcus neoformans
                                                                          Coprinus cinereus
                                                                          Phanerochaete chrysosporium
                                                                          Schizosaccharomyces pombe
             Fungi
                                                                          Sclerotinia sclerotiorum
                     Ascomycota
                                                                          Botrytis cinerea
                                                                          Magnaporthe grisea
                                                   Sordiomycota
                                                                          Neurospora crassa
                                                                          Podospora anserina
                                                                          Chaetomium globosum
                                                                          Trichoderma reesi
                                       Euasco
                                                                          Fusarium verticillioides
                                                                          Fusarium graminearum
                                                                          Stagonospora nodorum
                                                                          Aspergillus terreus
                                                           Aspgergillus
                                                                          Aspergillus nidulans
                                                                          Aspergillus fumigatus
                                                   Eurotiomycota
                                                                          Histoplasma capsulatum
                                                      Onygenales          Coccidioides immitis
                                                                          Uncinocarpus reesii
                                                                          Yarrowia lipolytica
                                                                          Kluyveromyces lactis
                                                                          Ashbya gossypii
                                  Hemiascomycota
Opistikant                                                                Candida glabrata
                                                                          Saccharomyces castellii
                                                                          Saccharomyces bayanus
                                                                          Saccharomyces kudriavzevii
                                                                          Saccharomyces mikatae
                                                                          Saccharomyces paradoxus
                                                                          Saccharomyces cerevisiae
                                                        Saccharomyces
                                                        Complex           Saccharomyces cerevisiae
                                                                          Saccharomyces cerevisiae
                                                                          Saccharomyces kluyveri
                                                                          Kluyveromyces waltii
                                                                          Candida tropicalis
                                                                          Candida albicans
                                                                          Candida dubliniensis
                                                                          Candida guilliermondii
                                                                          Debaryomyces hansenii
                                                                          Candida lusitaniae
Consensus Splice Site
       Exon                                                                                                                           Exon


                                                                                                              AG
                  GT G
                       H.sapiens exon|intron                                                           H.sapiens intron|exon
   2                                                                             2




                                                                                                    C
bits




                                                                              bits
   1                                                                             1

                                   A
                                   A                                                      T
                                                       T
         G
        A                                                                                                                             G
                                  G                                                                 T
                                                                                   C                                                      A
                                           C G
                                            T                                                                                         C
         T
        T                                                                                                                              T
   0                                                                             0
                                             C
             A                                                                                                                        A
                                               T
       G                                                                                       T




                                                C                                                                                      C
                                                                                          A

       C                                   G       A                                  G
             C
        -2



             -1



                   0



                              1



                                       2



                                               3



                                                   4



                                                           5




                                                                                          -5




                                                                                               -4




                                                                                                        -3




                                                                                                                   -2




                                                                                                                                 -1




                                                                                                                                      0




                                                                                                                                             1
   5′                                                                  3′            5′




                  GTATGT                                                                                      AG
                                                       weblogo.berkeley.edu                                                               weblogo.berkele
                    S.cerevisiae exon|intron
   2                                                                                                  S.cerevisiae intron|exon
                                                                                 2
bits




   1




                                                                              bits
                                                                                 1

                                                                                                    T
                                                                                           A
                                                                                                    C
                                           C
        A
                                                                                          T
   0
             G
        T                                              A
                                           A
             A
       G


                                                       C
       C




                                                                                          A
        -2



             -1



                   0



                              1



                                       2



                                               3



                                                   4



                                                           5
                                                                                           T
                                                                                 0     G
                                                                                      C




   5′                                                                  3′
                                                                                                    A
                                                                                               C




                                                                                          -5




                                                                                               -4




                                                                                                        -3




                                                                                                                   -2




                                                                                                                                 -1




                                                                                                                                      0




                                                                                                                                             1
                                                                                 5′




                  GT G                                                                                        AG
                                                       weblogo.berkeley.edu
                                                                                                                                          weblogo.berkele

                  F.graminearium exon|intron                                                        F.graminearium intron|exon
   2                                                                             2
bits




                                                                              bits
   1                                                                             1

                                   AA                  T                                            C
                                                                                                    T
                                  GT                A
                                                   T
             G                                                                            T


   0                                                                             0
                                                    C
                                   C               A
                                  C
             T
                                                                                      C



                                                                                                    A
                                                                                               A




             C




                                   T
                                                                                          A
                                                    G
                                                   C
             A                             G
        -2



             -1



                   0



                              1



                                       2



                                               3



                                                   4



                                                           5




                                                                                          -5




                                                                                               -4




                                                                                                        -3




                                                                                                                   -2




                                                                                                                                 -1




                                                                                                                                      0




                                                                                                                                             1
   5′                                                                  3′        5′




                                                                                                              AG
                  GT G
                                                       weblogo.berkeley.edu                                                               weblogo.berkele

                   C.neoformans exon|intron                                                          C.neoformans intron|exon
   2                                                                             2
bits




                                                                              bits




   1                                                                             1

                                                       T                                            T
                                   AA
                                                                                                    C
                                  G                    C
             G                             C                                              T



   0                                                                             0
                                               TT
             T                                   A                                        A




                                                                                                    A
                                  C
             A
                                                       G
                                           G                                          C
                         C
                                                   A
        -2



             -1



                   0



                              1



                                       2



                                               3



                                                   4



                                                           5




                                                                                          -5




                                                                                               -4




                                                                                                        -3




                                                                                                                   -2




                                                                                                                                 -1




                                                                                                                                      0




                                                                                                                                             1
   5′                                                                  3′        5′
                                                       weblogo.berkeley.edu                                                               weblogo.berkele
2290
                      Arabidopsis thaliana
                                                                                        2685
                                          Fugu rubripes
                     2746                                                               2656
                                          Mus musculus
6699                           2767
                     2772
 -                             2772
                                                                                        2735
                                          Homo sapiens
                     2748
803.6                          2769
                                                                                         947
                                            Rhizopus oryzae
                             1498
              2109
                             1648                                                        86
                                                    Ustilago maydis
              2428
                             1490
              2064
                                                            Cryptococcus neoformans     1578
                     1614     958.6                                                     1615
                                                   Phanerochaete chrysosporium
                     1781        1519.9
                                                                                        1621
                                                    Coprinopsis cinerea
                     1605

                                                                                         214
                                                        Schizosaccharomyces pombe
                      1507
                                                                                         372
                                                             Fusarium graminearum
                      1803
                                          349.7
                      1492
                                                                                         368
                                                              Magnaporthe grisea
                                                              Podospora anserina
                                            339.4                                        359
                                                378.6                                    463
                                                               Chaetomium globosum
                            1128
                                      513
                            1803                 388.4
                                                                                         336
                                                               Neurospora crassa
                                      640
                            1141
                                      509                                                403
                                                            Stagonospora nodorum
                                                             Aspergillus nidulans        469
                                      410.2
                              630                                                        481
                                                             Aspergillus fumigatus
                                            474.0
                              1184          476.0
                                                                                         474
                                                             Aspergillus terreus
                              972
                                                                                         30
                                                            Yarrowia lipolytica
                                                                                          5
                                                              Debaryomyces hansenii
                                     27
                                     29                          Saccharomyces cerevisiae 7
                                            7
                                     28
                                                        7
                                                                                          6
                                                                  Candida glabrata
                                                    7
                                                                                          6
                                                                 Kluyveromyces lactis
                                                   7/6
                                                                                          7
                                                                 Ashbya gossypii
        0.1
Intron loss
• If loss predominates, how are introns lost?
• Fink (1987) model proposes mRNA
  integration into genome.
• Boeke et al (1985) showed loss intron
  through RNA intermediate which is
  integrated into genome.
• Large scale comparisons are too far awy
  to determine recent loss events.
Searching for Intron loss in
       C. neoformans
• Average of 5 introns per
    gene (Loftus et al, 2005)
•                                   A
    Average Ks between var.
    grubii and var. gattii 0.22      D
    (roughly mouse-rat             B/C
    divergence), C. gattii vs C.
    neoformans Ks is 0.35
• 5133 4-way orthologs
Loss/gain
        events observed
• 25 out of the 5133 loci had loss or gain
  events.
• 2 had evidence 4 or more intron loss/gain
  events.
• CNI01550, putative RNA helicase missing
  10 introns in var grubii (serotype A).
• ~5-7 singleton introns in one of the
  Cryptococcus spp indicative of intron gain.
10 introns lost in
            C. neoformans var grubii

    2   3   5   6 78   9   10 11   12          13 1415 16 17 18 19 20               21           22


1   2   3   5   6 78                    9-19                               20       21           22




                                                                         C.gattii        var grubii
                                                        var neoformans




                                                                                }
                                                                         }
Happened at the locus and
  was a perfect deletion



       1k      2k     3k       4k   5k      6k        7k   8k   9k   10k       11k
                                                                           CNI01560
      CNI01540                           CNI01550
 C.neoformans var neoformans
    89% nt identity                 92% nt identity                  86% nt identity
 C.neoformans var grubii
    85% nt identity                 88% nt identity                  85% nt identity
 C.gattii
Other Research
         Projects

• Multigene phylogenies of the fungi
• Gene family evolution
 • Sugar transporter family expansion in
   C.neoformans
 • Gene family evolution mammals
   (with M. Hahn)

Contenu connexe

Tendances

Debugging: Rules & Tools
Debugging: Rules & ToolsDebugging: Rules & Tools
Debugging: Rules & ToolsIan Barber
 
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...Raman Kannan
 
Perforce Object and Record Model
Perforce Object and Record Model  Perforce Object and Record Model
Perforce Object and Record Model Perforce
 
M12 random forest-part01
M12 random forest-part01M12 random forest-part01
M12 random forest-part01Raman Kannan
 
M11 bagging loo cv
M11 bagging loo cvM11 bagging loo cv
M11 bagging loo cvRaman Kannan
 
Perl Bag of Tricks - Baltimore Perl mongers
Perl Bag of Tricks  -  Baltimore Perl mongersPerl Bag of Tricks  -  Baltimore Perl mongers
Perl Bag of Tricks - Baltimore Perl mongersbrian d foy
 
The Joy of Smartmatch
The Joy of SmartmatchThe Joy of Smartmatch
The Joy of SmartmatchAndrew Shitov
 
Perl Sucks - and what to do about it
Perl Sucks - and what to do about itPerl Sucks - and what to do about it
Perl Sucks - and what to do about it2shortplanks
 
How to stand on the shoulders of giants
How to stand on the shoulders of giantsHow to stand on the shoulders of giants
How to stand on the shoulders of giantsIan Barber
 
M09-Cross validating-naive-bayes
M09-Cross validating-naive-bayesM09-Cross validating-naive-bayes
M09-Cross validating-naive-bayesRaman Kannan
 
TDOH 南區 WorkShop 2016 Reversing on Windows
TDOH 南區 WorkShop 2016 Reversing on WindowsTDOH 南區 WorkShop 2016 Reversing on Windows
TDOH 南區 WorkShop 2016 Reversing on WindowsSheng-Hao Ma
 
Palestra sobre Collections com Python
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Pythonpugpe
 
The Magic Of Tie
The Magic Of TieThe Magic Of Tie
The Magic Of Tiebrian d foy
 
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Workhorse Computing
 

Tendances (19)

Debugging: Rules & Tools
Debugging: Rules & ToolsDebugging: Rules & Tools
Debugging: Rules & Tools
 
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
 
Perforce Object and Record Model
Perforce Object and Record Model  Perforce Object and Record Model
Perforce Object and Record Model
 
M12 random forest-part01
M12 random forest-part01M12 random forest-part01
M12 random forest-part01
 
Perl6 in-production
Perl6 in-productionPerl6 in-production
Perl6 in-production
 
C99[2]
C99[2]C99[2]
C99[2]
 
M11 bagging loo cv
M11 bagging loo cvM11 bagging loo cv
M11 bagging loo cv
 
C99
C99C99
C99
 
Perl Bag of Tricks - Baltimore Perl mongers
Perl Bag of Tricks  -  Baltimore Perl mongersPerl Bag of Tricks  -  Baltimore Perl mongers
Perl Bag of Tricks - Baltimore Perl mongers
 
The Joy of Smartmatch
The Joy of SmartmatchThe Joy of Smartmatch
The Joy of Smartmatch
 
Perl Sucks - and what to do about it
Perl Sucks - and what to do about itPerl Sucks - and what to do about it
Perl Sucks - and what to do about it
 
How to stand on the shoulders of giants
How to stand on the shoulders of giantsHow to stand on the shoulders of giants
How to stand on the shoulders of giants
 
M09-Cross validating-naive-bayes
M09-Cross validating-naive-bayesM09-Cross validating-naive-bayes
M09-Cross validating-naive-bayes
 
TDOH 南區 WorkShop 2016 Reversing on Windows
TDOH 南區 WorkShop 2016 Reversing on WindowsTDOH 南區 WorkShop 2016 Reversing on Windows
TDOH 南區 WorkShop 2016 Reversing on Windows
 
Palestra sobre Collections com Python
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Python
 
The Magic Of Tie
The Magic Of TieThe Magic Of Tie
The Magic Of Tie
 
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
 
zinno
zinnozinno
zinno
 
C99
C99C99
C99
 

En vedette

Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomicsNikhil Aggarwal
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAmol Kunde
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsprateek kumar
 
What is comparative genomics
What is comparative genomicsWhat is comparative genomics
What is comparative genomicsUsman Arshad
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicshemantbreeder
 
Database management system
Database management systemDatabase management system
Database management systemashishkthakur94
 
Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)Leighton Pritchard
 
TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...
TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...
TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...CGIAR Generation Challenge Programme
 
Protein database ..... of NCBI
Protein database ..... of NCBI Protein database ..... of NCBI
Protein database ..... of NCBI Alagppa University
 
Comparative genomics presentation
Comparative genomics presentationComparative genomics presentation
Comparative genomics presentationEmmanuel Aguon
 
Startup india agri start-ups
Startup india   agri start-upsStartup india   agri start-ups
Startup india agri start-upsSahil Swangla
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsPawan Kumar
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
Development of Agriculture Sector in Malaysia
Development of Agriculture Sector in MalaysiaDevelopment of Agriculture Sector in Malaysia
Development of Agriculture Sector in Malaysiasuraya izad
 
Bioinformatics Final Presentation
Bioinformatics Final PresentationBioinformatics Final Presentation
Bioinformatics Final PresentationShruthi Choudary
 

En vedette (20)

Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
What is comparative genomics
What is comparative genomicsWhat is comparative genomics
What is comparative genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
GRM 2011: Theme 1 -- Comparative genomics
GRM 2011: Theme 1 -- Comparative genomicsGRM 2011: Theme 1 -- Comparative genomics
GRM 2011: Theme 1 -- Comparative genomics
 
Database management system
Database management systemDatabase management system
Database management system
 
Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)
 
03 Object Dbms Technology
03 Object Dbms Technology03 Object Dbms Technology
03 Object Dbms Technology
 
TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...
TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...
TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...
 
Protein database ..... of NCBI
Protein database ..... of NCBI Protein database ..... of NCBI
Protein database ..... of NCBI
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
genomic comparison
genomic comparison genomic comparison
genomic comparison
 
Comparative genomics presentation
Comparative genomics presentationComparative genomics presentation
Comparative genomics presentation
 
Startup india agri start-ups
Startup india   agri start-upsStartup india   agri start-ups
Startup india agri start-ups
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Development of Agriculture Sector in Malaysia
Development of Agriculture Sector in MalaysiaDevelopment of Agriculture Sector in Malaysia
Development of Agriculture Sector in Malaysia
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Bioinformatics principles and applications
 
Bioinformatics Final Presentation
Bioinformatics Final PresentationBioinformatics Final Presentation
Bioinformatics Final Presentation
 

Similaire à Comparative Genomics with GMOD and BioPerl

ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsJan Aerts
 
APIs and Synthetic Biology
APIs and Synthetic BiologyAPIs and Synthetic Biology
APIs and Synthetic BiologyUri Laserson
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryJan Aerts
 
Hidden treasures of Ruby
Hidden treasures of RubyHidden treasures of Ruby
Hidden treasures of RubyTom Crinson
 
Distributed load testing with k6
Distributed load testing with k6Distributed load testing with k6
Distributed load testing with k6Thijs Feryn
 
Managing MariaDB Server operations with Percona Toolkit
Managing MariaDB Server operations with Percona ToolkitManaging MariaDB Server operations with Percona Toolkit
Managing MariaDB Server operations with Percona ToolkitSveta Smirnova
 
Down the rabbit hole, profiling in Django
Down the rabbit hole, profiling in DjangoDown the rabbit hole, profiling in Django
Down the rabbit hole, profiling in DjangoRemco Wendt
 
03 - Refresher on buffer overflow in the old days
03 - Refresher on buffer overflow in the old days03 - Refresher on buffer overflow in the old days
03 - Refresher on buffer overflow in the old daysAlexandre Moneger
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+ConFoo
 
Pygrunn 2012 down the rabbit - profiling in python
Pygrunn 2012   down the rabbit - profiling in pythonPygrunn 2012   down the rabbit - profiling in python
Pygrunn 2012 down the rabbit - profiling in pythonRemco Wendt
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Workhorse Computing
 
Ruby on Rails: Tasty Burgers
Ruby on Rails: Tasty BurgersRuby on Rails: Tasty Burgers
Ruby on Rails: Tasty BurgersAaron Patterson
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificityNeil Saunders
 
Php tips-and-tricks4128
Php tips-and-tricks4128Php tips-and-tricks4128
Php tips-and-tricks4128PrinceGuru MS
 
PERL for QA - Important Commands and applications
PERL for QA - Important Commands and applicationsPERL for QA - Important Commands and applications
PERL for QA - Important Commands and applicationsSunil Kumar Gunasekaran
 
04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)Alexandre Moneger
 
Gazelle - Plack Handler for performance freaks #yokohamapm
Gazelle - Plack Handler for performance freaks #yokohamapmGazelle - Plack Handler for performance freaks #yokohamapm
Gazelle - Plack Handler for performance freaks #yokohamapmMasahiro Nagano
 

Similaire à Comparative Genomics with GMOD and BioPerl (20)

ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPs
 
APIs and Synthetic Biology
APIs and Synthetic BiologyAPIs and Synthetic Biology
APIs and Synthetic Biology
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discovery
 
Hidden treasures of Ruby
Hidden treasures of RubyHidden treasures of Ruby
Hidden treasures of Ruby
 
Distributed load testing with k6
Distributed load testing with k6Distributed load testing with k6
Distributed load testing with k6
 
Managing MariaDB Server operations with Percona Toolkit
Managing MariaDB Server operations with Percona ToolkitManaging MariaDB Server operations with Percona Toolkit
Managing MariaDB Server operations with Percona Toolkit
 
Down the rabbit hole, profiling in Django
Down the rabbit hole, profiling in DjangoDown the rabbit hole, profiling in Django
Down the rabbit hole, profiling in Django
 
03 - Refresher on buffer overflow in the old days
03 - Refresher on buffer overflow in the old days03 - Refresher on buffer overflow in the old days
03 - Refresher on buffer overflow in the old days
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
 
Pygrunn 2012 down the rabbit - profiling in python
Pygrunn 2012   down the rabbit - profiling in pythonPygrunn 2012   down the rabbit - profiling in python
Pygrunn 2012 down the rabbit - profiling in python
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!
 
Ruby on Rails: Tasty Burgers
Ruby on Rails: Tasty BurgersRuby on Rails: Tasty Burgers
Ruby on Rails: Tasty Burgers
 
DataMapper
DataMapperDataMapper
DataMapper
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
 
Php tips-and-tricks4128
Php tips-and-tricks4128Php tips-and-tricks4128
Php tips-and-tricks4128
 
PERL for QA - Important Commands and applications
PERL for QA - Important Commands and applicationsPERL for QA - Important Commands and applications
PERL for QA - Important Commands and applications
 
04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)
 
Gazelle - Plack Handler for performance freaks #yokohamapm
Gazelle - Plack Handler for performance freaks #yokohamapmGazelle - Plack Handler for performance freaks #yokohamapm
Gazelle - Plack Handler for performance freaks #yokohamapm
 
BioMake BOSC 2004
BioMake BOSC 2004BioMake BOSC 2004
BioMake BOSC 2004
 
Spark_Documentation_Template1
Spark_Documentation_Template1Spark_Documentation_Template1
Spark_Documentation_Template1
 

Plus de Jason Stajich

Fungi DB at Keystone meeting
Fungi DB at Keystone meetingFungi DB at Keystone meeting
Fungi DB at Keystone meetingJason Stajich
 
MOMY FungiDB tutorial
MOMY FungiDB tutorialMOMY FungiDB tutorial
MOMY FungiDB tutorialJason Stajich
 
Stajich phyloseminar 2011 06-29
Stajich phyloseminar 2011 06-29Stajich phyloseminar 2011 06-29
Stajich phyloseminar 2011 06-29Jason Stajich
 
2011 05-02 fungi db-iccc
2011 05-02 fungi db-iccc2011 05-02 fungi db-iccc
2011 05-02 fungi db-icccJason Stajich
 
2009 11 09 UCLA Bioinformatics Talk
2009 11 09 UCLA Bioinformatics Talk2009 11 09 UCLA Bioinformatics Talk
2009 11 09 UCLA Bioinformatics TalkJason Stajich
 
2009 11 16 UCR Comp Sci
2009 11 16 UCR Comp Sci2009 11 16 UCR Comp Sci
2009 11 16 UCR Comp SciJason Stajich
 
Evolution and exploration of the transcriptional landscape in two filamentous...
Evolution and exploration of the transcriptional landscape in two filamentous...Evolution and exploration of the transcriptional landscape in two filamentous...
Evolution and exploration of the transcriptional landscape in two filamentous...Jason Stajich
 
Connecting ultrastructure to molecules
Connecting ultrastructure to moleculesConnecting ultrastructure to molecules
Connecting ultrastructure to moleculesJason Stajich
 
Connecting ultrastructure to molecules: Studying evolution of molecular comp...
Connecting ultrastructure to molecules: Studying evolution  of molecular comp...Connecting ultrastructure to molecules: Studying evolution  of molecular comp...
Connecting ultrastructure to molecules: Studying evolution of molecular comp...Jason Stajich
 
Comparative genomics of the fungal kingdom
Comparative genomics of the fungal kingdomComparative genomics of the fungal kingdom
Comparative genomics of the fungal kingdomJason Stajich
 
Comparative genomics of fung: studying adaptation through gene family evolution
Comparative genomics of fung: studying adaptation through gene family evolutionComparative genomics of fung: studying adaptation through gene family evolution
Comparative genomics of fung: studying adaptation through gene family evolutionJason Stajich
 
Bioinformatics and BioPerl
Bioinformatics and BioPerlBioinformatics and BioPerl
Bioinformatics and BioPerlJason Stajich
 
Evolution of gene family size change in fungi
Evolution of gene family size change in fungiEvolution of gene family size change in fungi
Evolution of gene family size change in fungiJason Stajich
 
BioPerl: The evolution of a Bioinformatics Toolkit
BioPerl: The evolution of a Bioinformatics ToolkitBioPerl: The evolution of a Bioinformatics Toolkit
BioPerl: The evolution of a Bioinformatics ToolkitJason Stajich
 
BioPerl: Developming Open Source Software
BioPerl: Developming Open Source SoftwareBioPerl: Developming Open Source Software
BioPerl: Developming Open Source SoftwareJason Stajich
 

Plus de Jason Stajich (15)

Fungi DB at Keystone meeting
Fungi DB at Keystone meetingFungi DB at Keystone meeting
Fungi DB at Keystone meeting
 
MOMY FungiDB tutorial
MOMY FungiDB tutorialMOMY FungiDB tutorial
MOMY FungiDB tutorial
 
Stajich phyloseminar 2011 06-29
Stajich phyloseminar 2011 06-29Stajich phyloseminar 2011 06-29
Stajich phyloseminar 2011 06-29
 
2011 05-02 fungi db-iccc
2011 05-02 fungi db-iccc2011 05-02 fungi db-iccc
2011 05-02 fungi db-iccc
 
2009 11 09 UCLA Bioinformatics Talk
2009 11 09 UCLA Bioinformatics Talk2009 11 09 UCLA Bioinformatics Talk
2009 11 09 UCLA Bioinformatics Talk
 
2009 11 16 UCR Comp Sci
2009 11 16 UCR Comp Sci2009 11 16 UCR Comp Sci
2009 11 16 UCR Comp Sci
 
Evolution and exploration of the transcriptional landscape in two filamentous...
Evolution and exploration of the transcriptional landscape in two filamentous...Evolution and exploration of the transcriptional landscape in two filamentous...
Evolution and exploration of the transcriptional landscape in two filamentous...
 
Connecting ultrastructure to molecules
Connecting ultrastructure to moleculesConnecting ultrastructure to molecules
Connecting ultrastructure to molecules
 
Connecting ultrastructure to molecules: Studying evolution of molecular comp...
Connecting ultrastructure to molecules: Studying evolution  of molecular comp...Connecting ultrastructure to molecules: Studying evolution  of molecular comp...
Connecting ultrastructure to molecules: Studying evolution of molecular comp...
 
Comparative genomics of the fungal kingdom
Comparative genomics of the fungal kingdomComparative genomics of the fungal kingdom
Comparative genomics of the fungal kingdom
 
Comparative genomics of fung: studying adaptation through gene family evolution
Comparative genomics of fung: studying adaptation through gene family evolutionComparative genomics of fung: studying adaptation through gene family evolution
Comparative genomics of fung: studying adaptation through gene family evolution
 
Bioinformatics and BioPerl
Bioinformatics and BioPerlBioinformatics and BioPerl
Bioinformatics and BioPerl
 
Evolution of gene family size change in fungi
Evolution of gene family size change in fungiEvolution of gene family size change in fungi
Evolution of gene family size change in fungi
 
BioPerl: The evolution of a Bioinformatics Toolkit
BioPerl: The evolution of a Bioinformatics ToolkitBioPerl: The evolution of a Bioinformatics Toolkit
BioPerl: The evolution of a Bioinformatics Toolkit
 
BioPerl: Developming Open Source Software
BioPerl: Developming Open Source SoftwareBioPerl: Developming Open Source Software
BioPerl: Developming Open Source Software
 

Dernier

Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024Adnet Communications
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Doge Mining Website
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Americas Got Grants
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environmentelijahj01012
 
Chapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditChapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditNhtLNguyn9
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCRashishs7044
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 

Dernier (20)

Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environment
 
Chapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditChapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal audit
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 

Comparative Genomics with GMOD and BioPerl

  • 1. BioPerl and Comparative Genomics • BioPerl - toolkit for Bioinformatics data • Fungal comparative genomics
  • 2. Overview of Toolkit Bioperl is... A Set of Perl modules for manipulating genomic and other biological data An Open Source Toolkit with many contributors A flexible and extensible system for doing bioinformatics data manipulation
  • 3. Some things you can do Read in sequence data from a file in standard formats (FASTA, GenBank, EMBL, SwissProt,...) Manipulate sequences, reverse complement, translate coding DNA sequence to protein. Parse a BLAST report, get access to every bit of data in the report
  • 4. Major Domains covered in Bioperl Sequences, Features, Annotations, Pairwise alignment reports Multiple Sequence Alignments Bibliographic data Graphical Rendering of sequence tracks Database for features and sequences
  • 5. Additional things Gene prediction parsers Trees, Parsing Phylogenetic and Molecular Evolution software output Population Genetic data and summary statistics Taxonomy Protein Structure
  • 6. Practical Examples Manipulate a DNA or Protein Sequence Read and write different Sequence formats Extract sequence annotations and features Parse a BLAST report
  • 7. Orienting to the code http://cvs.open-bio.org bioperl-live - Core packages bioperl-run - for running applications bioperl-ext - C language extension bioperl-db - bioperl BioSQL implementation bioperl-pedigree, bioperl-microarray are side-projects
  • 8. Website http://bioperl.org or http://bio.perl.org/ HOWTOs Frequently Asked Questions (FAQ) News Links to online Documentation
  • 9. Anatomy of a Bioperl Module perldoc Module -- perldoc Bio::SeqIO SYNOPSIS -- runnable code DESCRIPTION -- summary about the module Each module will have annotated functions
  • 10. A Tour: Core Objects Bioperl Sequences, Features, Locations, Annotations Sequence searching & pairwise alignments Multiple Sequence Alignments
  • 11. Sequences and Features TSS Feature PolyA site Exon Features Sequence Genomic Sequence with 3 exons 1 Transcript Start Site (TSS) 1 Poly-A Site
  • 12. Sequences, Features, and Annotations Sequence - DNA, RNA, AA Feature container Feature - Information with a Sequence Location Annotation - Information without explicit Sequence location
  • 13. Parsing Sequences Bio::SeqIO multiple drivers: genbank, embl, fasta,... Sequence objects Bio::PrimarySeq Bio::Seq Bio::Seq::RichSeq
  • 14. Look at the Sequence object Common (Bio::PrimarySeq) methods seq() - get the sequence as a string length() - get the sequence length subseq($s,$e) - get a subseqeunce translate(...) - translate to protein [DNA] revcom() - reverse complement [DNA] display_id() - identifier string description() - description string
  • 15. Using a Sequence my $str = “ATGAATGATGAA”; my $seq = Bio::PrimarySeq->new(-seq => $str, -display_id=”example”); print “id is “, $seq-display_id,” ”; print $seq-seq, “ ”; my $revcom = $seq-revcom; print $revcom-seq, “ ”; print “frame1=”,$seq-translate-seq,“ ”; id is example ATGAATGATGAA TTCATCATTCAT trans frame1=MNDE
  • 16. Sequence Features Bio::SeqFeatureI - interface - GFF derived start(), end(), strand() for location information location() - Bio::LocationI object (to represent complex locations) score,frame,primary_tag, source_tag - feature information spliced_seq() - for attached sequence, get the sequence spliced.
  • 18. has-a Bio::Annotation::Collection Bio::Seq Annotations Features has-a has-a Bio::SeqFeature::Generic Bio::Annotation::Comment has-a Bio::LocationI
  • 19. Reading and Writing Sequences Bio::SeqIO fasta, genbank, embl, swissprot,... Takes care of writing out associated features and annotations Two functions next_seq (reading sequences) write_seq (writing sequences)
  • 20. Writing a Sequence use Bio::SeqIO; # Let’s convert swissprot to fasta format my $in = Bio::SeqIO-new(-format = ‘swiss’, -file = ‘file.sp’); my $out = Bio::SeqIO-new(-format = ‘fasta’, -file = ‘file.fa’);` while( my $seq = $in-next_seq ) { $out-write_seq($seq); }
  • 21. v Mon Nov 17 12:38:43 2003 1 opt E() 20 323 0:== 22 0 0: one = represents 184 library sequences 24 2 0:= 26 12 2:* 28 61 26:* 30 211 157:*= 32 664 607:===* 34 1779 1645:========*= 36 3558 3379:==================*= 38 5908 5584:==============================*== 40 8049 7790:==========================================*= 42 10001 9522:===================================================*=== 44 10660 10503:=========================================================* 46 10987 10698:==========================================================*= 48 10332 10242:=======================================================*= 50 9053 9346:==================================================* 52 7736 8217:=========================================== * 54 6828 7018:======================================* 56 5448 5863:============================== * 58 4484 4813:========================= * 60 3818 3899:=====================* Sequence Database 62 2942 3126:================* 64 2407 2486:=============* 66 1866 1965:==========* 68 1495 1545:========* 70 1169 1211:======* 72 886 946:=====* 74 708 738:====* 76 542 574:===* Searching 78 451 446:==* 80 355 347:=* 82 271 265:=* 84 211 210:=* 86 151 163:* 88 104 126:* inset = represents 3 library sequences 90 101 97:* 92 78 75:* :========================*= 94 56 58:* :===================* 96 38 45:* :============= * 98 26 35:* :========= * 100 26 27:* :========* 102 20 21:* :======* 104 13 16:* :=====* 106 22 12:* :===*==== 108 10 10:* :===* 110 5 7:* :==* 112 4 6:* :=* 114 4 4:* :=* 116 3 3:* :* 118 9 3:* :*== 120 110 2:* :*====================================
  • 22. A Detailed look at BLAST parsing 3 Components Result: Bio::Search::Result::ResultI Hit: Bio::Search::Hit::HitI HSP: Bio::Search::HSP::HSPI
  • 23. use Bio::SearchIO; my $cutoff = ’0.001’; my $file = ‘BOSS_Ce.BLASTP’, my $in = new Bio::SearchIO(-format = ‘blast’, -file = $file); while( my $r = $in-next_result ) { print Query is: , $r-query_name, , $r-query_description, ,$r-query_length, aa ; print Matrix was , $r-get_parameter(’matrix’), ; while( my $h = $r-next_hit ) { last if $h-significance $cutoff; print Hit is , $h-name, ; while( my $hsp = $h-next_hsp ) { print HSP Len is , $hsp-length(’total’), , E-value is , $hsp-evalue, Bit score , $hsp-score, , Query loc: ,$hsp-query-start, , $hsp-query-end, , Sbject loc: ,$hsp-hit-start, , $hsp-hit-end, ; } } }
  • 24. BLAST Report Copyright (C) 1996-2000 Washington University, Saint Louis, Missouri USA. All Rights Reserved. Reference: Gish, W. (1996-2000) http://blast.wustl.edu Query= BOSS_DROME Bride of sevenless protein precursor. (896 letters) Database: wormpep87 20,881 sequences; 9,238,759 total letters. Searching....10....20....30....40....50....60....70....80....90....100% done Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N F35H10.10 CE24945 status:Partially_confirmed TR:Q20073... 182 4.9e-11 1 M02H5.2 CE25951 status:Predicted TR:Q966H5 protein_id:... 86 0.15 1 ZC506.4 CE01682 locus:mgl-1 metatrophic glutamate recept... 91 0.18 1 F23D12.2 CE05700 status:Partially_confirmed TR:Q19761 ... 73 0.45 3 F35H10.10 CE24945 status:Partially_confirmed TR:Q20073 protein_id:AAA81683.2 Length = 1404 Score = 182 (69.1 bits), Expect = 4.9e-11, P = 4.9e-11 Identities = 75/315 (23%), Positives = 149/315 (47%) Query: 511 YPFLFDGESVMFWRIKMDTWVATGLTAAILGLIATLAILVFIVVRISLGDVFEGNPTTSI 570 Y +F+ + WR +V L ++ + +A+LV ++V++ L V +GN + I Sbjct: 1006 YQSVFEHITTGHWRDHPHNYVLLALITVLV--VVAIAVLVLVLVKLYLR-VVKGNQSLGI 1062
  • 25. BLAST Script Results Query is: BOSS_DROME Bride of sevenless protein precursor. 896 aa Matrix was BLOSUM62 Hit is F35H10.10 HSP Len is 315 E-value is 4.9e-11 Bit score 182 Query loc: 511 813 Sbject loc: 1006 1298 HSP Len is 28 E-value is 1.4e-09 Bit score 39 Query loc: 508 535 Sbject loc: 427 454
  • 26. FASTA Parsing Script use Bio::SearchIO; my $cutoff = ’0.001; my $file = ‘BOSS_Ce.FASTP’, my $in = new Bio::SearchIO(-format = ‘fasta’, -file = $file); while( my $r = $in-next_result ) { print Query is: , $r-query_name, , $r-query_description, ,$r-query_length, aa ; print Matrix was , $r-get_parameter(’matrix’), ; while( my $h = $r-next_hit ) { last if $h-significance $cutoff; print Hit is , $h-name, ; while( my $hsp = $h-next_hsp ) { print HSP Len is , $hsp-length(’total’), , E-value is , $hsp-evalue, Bit score , $hsp-score, , Query loc: ,$hsp-query-start, , $hsp-query-end, , Sbject loc: ,$hsp-hit-start, , $hsp-hit-end, ; } } }
  • 27. FASTA Report FASTA searches a protein or DNA sequence data bank version 3.4t20 Sep 26, 2002 Please cite: W.R. Pearson D.J. Lipman PNAS (1988) 85:2444-2448 Query library BOSS_DROME.aa vs /blast/wormpep87 library searching /blast/wormpep87 library 1BOSS_DROME Bride of sevenless protein precursor. - 896 aa vs /blast/wormpep87 library 9238759 residues in 20881 sequences Expectation_n fit: rho(ln(x))= 5.6098+/-0.000519; mu= 12.8177+/- 0.030 mean_var=107.8223+/-22.869, 0's: 0 Z-trim: 2 B-trim: 0 in 0/62 Lambda= 0.123515 Kolmogorov-Smirnov statistic: 0.0333 (N=29) at 48 FASTA (3.45 Mar 2002) function [optimized, BL50 matrix (15:-5)] ktup: 2 join: 38, opt: 26, gap-pen: -12/-2, width: 16 Scan time: 9.680 The best scores are: opt bits E(20881) F35H10.10 CE24945 status:Partially_confirmed T (1404) 207 48.5 6.8e-05 T06E4.11 CE06377 locus:pqn-63 status:Predicted ( 275) 122 32.6 0.8 C33B4.3 CE01508 ankyrin and proline rich domain (1110) 124 33.6 1.6 Y48C3A.8 CE22141 status:Predicted TR:Q9NAG3 pr ( 291) 110 30.5 3.7 Y34D9A.2 CE30217 status:Partially_confirmed TR ( 326) 108 30.2 5.1 K06H7.3 CE26941 Isopentenyl-diphosphate delta i ( 618) 107 30.3 8.9 F44B9.8 CE29044 ARPA status:Partially_confirmed ( 388) 104 29.5 9.4 F35H10.10 CE24945 status:Partially_confirmed TR:Q20 (1404 aa) initn: 94 init1: 94 opt: 207 Z-score: 197.9 bits: 48.5 E(): 6.8e-05 Smith-Waterman score: 275; 22.527% identity (27.152% ungapped) in 728 aa overlap (207-847:640-1330) 180 190 200 210 220 230 BOSS_D RAISIDNASLAENLLIQEVQFLQQCTTYSMGIFVDWELYKQLESVIKD---LEYNIWPIP
  • 28. FASTA Script Results Query is: BOSS_DROME Bride of sevenless protein precursor. 896 aa Matrix was BL50 Hit is F35H10.10 HSP Len is 728 E-value is 6.8e-05 Bit score 197.9 Query loc: 207 847 Sbject loc: 640 1330
  • 29. Using the Search::Result object use Bio::SearchIO; use strict; my $parser = new Bio::SearchIO(-format = ‘blast’, -file = ‘file.bls’); while( my $result = $parser-next_result ){ print “query name=“, $result-query_name, “ desc=”, $result-query_description, “, len=”,$result- query_length,“ ”; print “algorithm=“, $result-algorithm, “ ”; print “db name=”, $result-database_name, “ #lets=”, $result-database_letters, “ #seqs=”,$result- database_entries, “ ”; print “available params “, join(’,’, $result-available_parameters),” ”; print “available stats “, join(’,’, $result-available_statistics), “ ”; print “num of hits “, $result-num_hits, “ ”; }
  • 30. Using the Search::Hit Object use Bio::SearchIO; use strict; my $parser = new Bio::SearchIO(-format = ‘blast’, -file = ‘file.bls’); while( my $result = $parser-next_result ){ while( my $hit = $result-next_hit ) { print “hit name=”,$hit-name, “ desc=”, $hit-description, “ len=”, $hit-length, “ acc=”, $hit-accession, ” ”; print “raw score “, $hit-raw_score, “ bits “, $hit-bits, “ significance/evalue=“, $hit-evalue, “ ”; } }
  • 31. Cool Hit Methods start(), end() - get overall alignment start and end for all HSPs strand() - get best overall alignment strand matches() - get total number of matches across entire set of HSPs (can specify only exact ‘id’ or conservative ‘cons’)
  • 32. Using the Search::HSP Object use Bio::SearchIO; use strict; my $parser = new Bio::SearchIO(-format = ‘blast’, -file = ‘file.bls’); while( my $result = $parser-next_result ){ while( my $hit = $result-next_hit ) { while( my $hsp = $hit-next_hsp ) { print “hsp evalue=“, $hsp-evalue, “ score=“ $hsp-score, “ ”; print “total length=“, $hsp-hsp_length, “ qlen=”, $hsp-query-length, “ hlen=”,$hsp-hit-length, “ ”; print “qstart=”,$hsp-query-start, “ qend=”,$hsp-query-end, “ qstrand=“, $hsp-query-strand, “ ”; print “hstart=”,$hsp-hit-start, “ hend=”,$hsp-hit-end, “ hstrand=“, $hsp-hit-strand, “ ”; print “percent identical “, $hsp-percent_identity, “ frac conserved “, $hsp-frac_conserved(), “ ”; print “num query gaps “, $hsp-gaps(’query’), “ ”; print “hit str =”, $hsp-hit_string, “ ”; print “query str =”, $hsp-query_string, “ ”; print “homolog str=”, $hsp-homology_string, “ ”; } }
  • 33. Cool HSP methods rank() - order in the alignment (which you could have requested, by score, size) matches - overall number of matches seq_inds - get a list of numbers representing residue positions which are conserved, identical, mismatches, gaps
  • 34. SearchIO system BLAST (WU-BLAST, NCBI, XML, PSIBLAST, BL2SEQ, MEGABLAST, TABULAR (-m8/m9)) FASTA (m9 and m0) HMMER (hmmpfam, hmmsearch) UCSC formats (WABA, AXT, PSL) Gene based alignments Exonerate, SIM4, {Gene,Genome}wise
  • 35. SearchIO reformatting Supports output of Search reports as well Bio::SearchIO::Writer “BLAST flavor” HTML, Text Tabular Report Format
  • 36. Bioperl Reformatted HTML of BLASTP Search Report for gi|6319512|ref|NP_009594.1| BLASTP 2.0MP-WashU [04-Feb-2003] [linux24-i686-ILP32F64 2003-02-04T19:05:09] Copyright (C) 1996-2000 Washington University, Saint Louis, Missouri USA. All Rights Reserved. Reference: Gish, W. (1996-2000) http://blast.wustl.edu Hit Overview Score: Red= (=200), Purple 200-80, Green 80-50, Blue 50-40, Black 40 Build custom overview (Bio::Graphics) Query= gi|6319512|ref|NP_009594.1| chitin synthase 2; Chs2p [Saccharomyces cerevisiae] (963 letters) Database: cneoA_WI.aa 9,645 sequences; 2,832,832 total letters Hyperlink to external Score E resources Sequences producing significant alignments: (bits) value cneo_WIH99_157.Gene2 Start=295 End=4301 Strand=1 Length=912 ExonCt=24 1650 1.6e-173 cneo_WIH99_63.Gene181 Start=154896 End=151527 Strand=-1 Length=876 ExonCt=13 1441 3.9e-149 Hyperlink to cneo_WIH99_133.Gene1 Start=15489 End=19943 Strand=1 Length=1017 ExonCt=23 1357 3e-142 alignment part cneo_WIH99_45.Gene2 Start=84 End=3840 Strand=1 Length=839 ExonCt=25 1311 1.5e-138 of report cneo_WIH99_112.Gene165 Start=122440 End=118921 Strand=-1 Length=1036 ExonCt=9 198 1.2e-15 cneo_WIH99_11.Gene7 Start=39355 End=42071 Strand=1 Length=761 ExonCt=9 172 6.4e-13 cneo_WIH99_60.Gene9 Start=36153 End=32819 Strand=-1 Length=1020 ExonCt=5 166 1.2e-12 cneo_WIH99_106.Gene88 Start=242538 End=238790 Strand=-1 Length=1224 ExonCt=3 157 6.3e-09
  • 37.
  • 38. Turning BLAST into HTML use Bio::SearchIO; use Bio::SearchIO::Writer::HTMLResultWriter; my $in = new Bio::SearchIO(-format = 'blast', -file = shift @ARGV); my $writer = new Bio::SearchIO::Writer::HTMLResultWriter(); my $out = new Bio::SearchIO(-writer = $writer -file = “file.html”); $out-write_result($in-next_result);
  • 39. Turning BLAST into HTML # to filter your output my $MinLength = 100; # need a variable with scope outside the method sub hsp_filter { my $hsp = shift; return 1 if $hsp-length('total') $MinLength; } sub result_filter { my $result = shift; return $hsp-num_hits 0; } my $writer = new Bio::SearchIO::Writer::HTMLResultWriter (-filters = { 'HSP' = hsp_filter} ); my $out = new Bio::SearchIO(-writer = $writer); $out-write_result($in-next_result); # can also set the filter via the writer object $writer-filter('RESULT', result_filter);
  • 40. Trees • Support reading of Tree data • newick, nexus, nhx formats • Manipulation of Trees • Trees are connections of Nodes which have ancestor and children pointers 40
  • 41. Simple Tree Routines LCA • Lowest Common Ancestor • Tests of monophyly, paraphyly outgroup • Reroot tree • Distances between two nodes • Find a node by name 41
  • 42. Constructing Trees • For Purely Bioperl Solution (only in post Bioperl 1.4 releases) • Bio::Align::DNAStatistics and ProteinStatistics can build pairwise distance matricies • Bio::Tree::DistanceFactory has UPGMA, Neighbor-Joining implemented 42
  • 43. Interfacing with Phylip • Bio::Tools::Run::Phylo::Phylip • ProtDist (also parser Bio::Matrix::IO) • Neighbor, ProtPars, SeqBoot, Consense • DrawGram, DrawTree 43
  • 44. Reading/Writing a Tree #!/usr/bin/perl -w use Bio::TreeIO; use strict; my $in = Bio::TreeIO-new(-format = ‘nexus’, -file = ‘trees.nex’); my $out = Bio::TreeIO-new(-format = ‘newick’, -file = ‘trees.nh’); while( my $tree = $in-next_tree ) { $out-write_tree($tree); } 44
  • 45. Fetching subset of nodes #!/usr/bin/perl -w use strict; use Bio::TreeIO; my $in = Bio::TreeIO-new(-format = ‘newick’, -fh = *DATA); if( my $tree = $in-next_tree ) { my @nodes = $tree-get_nodes; my @tips = grep { $_-is_Leaf } @nodes; print there are ,scalar @tips, tips ; my @internal = grep { ! $_-is_Leaf } @nodes; print there are ,scalar @internal, internal nodes ; my ($A_node) = $tree-find_node(-id = 'A'); print branch length of Ancestor of ,$A_node-id, is , $A_node-ancestor-branch_length, ; } __DATA__ (((A:10,B:11):2,C:5),((D:7,F:6):17,G:8)); 45
  • 46. Walking up the tree (tips to root) if( my $tree = $in-next_tree ) { my @tips = grep { $_-is_Leaf } $tree-get_nodes; for my $node ( @tips ) { my @path; while( defined $node) { push @path, $node-id; $node = $node-ancestor; } print join(“,“, @path), “ ”; } } __DATA__ (((A:10,B:11)I1:2,C:5)I2,((D:7,F:6)I3:17,G:8)I4)Root; A I1 C,I2,Root B I2 A,I1,I2,Root C B,I1,I2,Root Root G,I4,Root D I3 D,I3,I4,Root F I4 F,I3,I4,Root 46 G
  • 47. From Alignments to Trees • Bio::AlignIO to parse the alignment • Bio::Align::ProteinStatistics to compute pairwise distances • Bio::Tree::DistanceFactory to build a tree based on a matrix of distances using NJ or UPGMA • More sophisticated tree building should be done with tools like phyml, PAUP, MOLPHY, PHYLIP, MrBayes, or PUZZLE 47
  • 48. Testing phylogenetic hypotheses • No sophisticated ML methods are currently built in Bioperl for testing for phylogenetic correlations, etc • Can export trees and use tools like Mesquite 48
  • 49. Bioperl Population Genetics • Basic data • Marker - polymorphic region of genome • Individual - individual sampled • Genotype - observed allele(s) for a marker in an individual • Population - collection of individuals 49
  • 50. The Players • Bio::PopGen::Population - container for Individuals, calculate summary stats • Bio::PopGen::Individual - container for Genotypes • Bio::PopGen::Marker - summary info about a Marker (primers, genome loc, allele freq) • Bio::PopGen::Genotype - pairing of Individual Marker; allele container 50
  • 51. Population Genetics Data Objects Population Has Map Individual Has Has Marker Genotype Has 51
  • 52. Some Data Formats • PrettyBase format (Seattle SNP data) MARKER SAMPLEID ALLELE1 ALLELE2 ProcR2973EA 01 C T ProcR2973EA 02 N N ProcR2973EA 03 C C ProcR2973EA 04 C T ProcR2973EA 05 C C • Comma Delimited (CSV) SAMPLE,MARKERNAME1,MARKERNAME2,... SAMPLE,ProcR2973EA,Marker2 sample-01,C T,G T sample-02,C C,G G • Phase format 3 5 P 300 1313 1500 2023 5635 MSSSM #1 12 1 0 1 3 52 11 0 1 0 3
  • 53. Reading in Data use Bio::PopGen::IO; my $in = Bio::PopGen::IO-new(-format = ‘csv’ -file = ‘dat.csv’); my @pop; while( my $ind = $in-next_individual ) { push @pop, $ind; } # OR my $pop = $in-next_population; 53
  • 54. Ready Built Stuff • Bio::PopGen::PopStats - population level statistics (only FST currently) • Bio::PopGen::Statistics - suite of Population Genetics statistical tests and summary stats. • Bio::PopGen::Simulation::Coalescent - primitive Coalescent simulation • Basic tree topology and branch length assignment. 54
  • 55. Using the Modules use Bio::PopGen::Statistics; my $stats = Bio::PopGen::Statistics-new(); my $pi = $stats-pi($population); # or use an array reference of Individuals my $pi = $stats-pi(@individuals); # Tajima’s D my $TajimaD = $stats-tajima_D($population); # Fu and Li’s D my $FuLiD = $stats-fu_and_li_D($ingroup_pop, $outgroup_ind); # Fu and Li’s D* my $FLDstar = $stats-fu_and_li_D_star($population); # pairwise composite LD my %LDstats = $stats-composite_LD($population); my $LDarray = $LDstats{’marker1’}-{’marker2’}; my ($ldval,$chisq) = @$LDarray; 55
  • 56. Getting Data from Alignments • use Bio::AlignIO to read in Multiple Sequence Alignment data • Bio::PopGen::Utilities aln_to_population will build Population from MSA • Will make a “Marker” for every polymorphic site (or if asked every site) • Eventually will have ability to only get silent/non-silent coding sites 56
  • 57. Simulations • Simulation::Coalescent (Hudson 1990) • Generate random topology • Assign mutations to branches • Tree tips are AlleleNodes (Tree::Node PopGen::Individual) 57
  • 58. Coalescent Simulation use Bio::PopGen::Simulation::Coalescent; my $factory = new Bio::PopGen::Simulation::Coalescent( -sample_size = 10, -maxcount = 100); # max number trees my $tree = $factory-next_tree; $factory-add_Mutations($tree,20); my @inds = $tree-get_leaf_nodes; use Bio::PopGen::Statistics; my $pi = Bio::PopGen::Statistics-pi(@inds); my $D = Bio::PopGen::Statistics- tajima_D (@inds); 58
  • 59. Scripts scripts/popgen/composite_LD -i filename.prettybase [-o outfile.LD] [--format prettybase/csv] [--sortbyld] [--noconvertindels] ProcR2973EA 01 C T ProcR2973EA 02 N N ProcR2973EA 03 C C ProcR2973EA 04 C T ProcR2973EA 05 C C ProcR2973EA 06 C T ProcR2973EA 07 C C ProC9198EA,ProcR2973EA - LD=-0.0375, chisq=2.56 59
  • 60. Simple pairwise Ka, Ks rates estimation • Within Bioperl can calculate pairwise K ,K a s and Ka/Ks using Nei’s method. • Bio::Align::DNAStatistics • calc_KaKs_pair($alnobj,$seq1id,$seq2id) calc_all_KaKs_pairs($alnobj) 60
  • 61. Ortholog/Homolog finding • Doesn’t really require much of Bioperl to do this • Want to find best reciprocal BLAST/ FASTA/SSEARCH hits • Build up a matrix, where each query sequence has a list of its best hits 61
  • 62. my $reportfile = ‘/usr/local/bio/data/reports/fly_worm.FASTA.m9.gz’; open(IN,”zcat $reportfile |”) || die $!; my %data; while(IN) { my @line = split; my ($query,$subject,$pid,$alnlen) = @line; my ($evalue) = $line[10]; next if $query eq $subject; push @{$data{$query}}, [$subject, $evalue,$pid,$alnlen]; } my %seen; my %orthologs; # some tracking and data vars for my $q ( keys %data ) { next if $seen{$q}++; # skip if we’ve processed it already my ($s, $evalue,$pid,$alnlen) = @{$data{$q}-[0]}; my ($qsp) = split(/:/,$q); my ($ssp) = split(/:/,$s); $seen{$s}++; next if( $qsp eq $ssp || $pid 30 || (defined $data{$q}-[1] 1e-10 abs($data{$q}-[1]-[1] - $evalue) ) ); # second best hit # is too good if( $data{$s}-[0]-[0] eq $q ) { my $id = join(”-”, sort ($qsp,$ssp)); push @{$orthologs{$id}}, [$q,$s,$evalue,$pid,$alnlen]; } } 62
  • 63. # print out the ortholog pairs, do some sorting to insure that # order of names is consistent and order by evalue for my $id ( keys %orthologs ) { open(OUT, “$id.orthologs”) || die $!; for my $pair ( sort {$a-[2] = $b-[2] } @{$orthologs{$id}} ) { my ($q,$s,$evalue,$pid,$alnlen) = @$pair; ($q,$s) = sort ( $q, $s ); # assume prefixed print OUT join(” ”, ($q,$s,$evalue,$pid,$alnlen)), “ ”; } } 63
  • 64. Gene families • Families can be constructed with single- linkage - merging all homologs into a single cluster • Or using a tool like MCL (http://micans.org/ mcl) to build gene families from a matrix of pairwise distances (BLAST or FASTA scores) 64
  • 65. Automating PAML • PAML - phylogenetic analysis with maximum likelihood • Estimate synonymous and non-synonymous substitution rates • Along branches of a tree or in a pairwise fashion 65
  • 66. Preparing Data • Multiple sequence alignments of protein coding sequence • No stop codons! • Must be aligned on codon boundaries • Easiest way is to align at protein level, then project back into CDS alignment 66
  • 67. Doing Protein Alignments • Bio::Tools::Run::Alignment::Clustalw or Bio::Tools::Run::Alignment::MUSCLE or just prepare the sequence files and run the alignment programs via scripts • Bio::AlignIO to parse the alignment data • Bio::Align::Utilities to project back into CDS space 67
  • 68. Build tree or assume a tree • If doing analysis of genomes which have a know species tree - use that tree • Branch lengths are not part of PAML. Multiple topologies can be provided to test alternative hypotheses (by comparing maximum likelihood values) 68
  • 69. Running PAML #!/usr/bin/perl -w use strict; use Bio::Tools::Run::Phylo::PAML::Codeml; use Bio::AlignIO; my $factory = Bio::Tools::Run::Phylo::PAML::Codeml-new( -params = { ‘runmode’ = -2, ‘seqtype’ = 1}); my $alnio = Bio::AlignIO-new(-format = ‘clustalw’, -file = ‘cds.aln’); my $aln = $alnio-next_aln; # get the alignment from file $factory-alignment($aln); # set the alignment my ($returncode,$parser) = $factory-run(); my $result = $parser-next_result; my $MLmatrix = $result-get_MLMatrix; print Ka = , $MLmatrix-[0]-[1]-{’dN’}, ; print Ks = , $MLmatrix-[0]-[1]-{’dS’}, ; print Ka/Ks = , $MLmatrix-[0]-[1]-{’omega’}, ; 69
  • 70. Parsing PAML #!/usr/bin/perl -w use strict; use Bio::Tools::Phylo::PAML; my $parser = Bio::Tools::Phylo::PAML-new (-file = ‘results/mlc’, -dir = ‘results’); if( my $result = $parser-next_result ) { my @otus = $result-get_seqs; # get Nei Gojobori dN/dS matrix my $NGmatrix = $result-get_NGmatrix; printf “%s and %s dS=%.4f dN=%.4f Omega=%.4f ”, $otus[0]-display_id, $otus[1]-display_id, $NGMatrix-[0]-[1]-{dS}, $NGMatrix-[0]-[1]-{dN}, $NGMatrix-[0]-[1]-{omega}; } 70
  • 71. Getting the Trees out my @trees = $result-get_trees; for my $tree ( @trees ) { print “likelihood is “, $tree-score, “ ”; # do something else with the trees, # for non runmode -2 results # inspect the tree, branch specific rates # the t (time) parameter is available via # (omega, dN, etc.) are available via # ($omega) = $node-get_tag_values('omega'); for my $node ( $tree-get_nodes ) { print $node-id, “ t=”,$node-branch_length, “ omega “, $node-get_tag_values(‘omega’); } } 71
  • 72. Summary Lots of modules to do lots of things How to find out what exists? Read HOWTOs, bptutorial, Browse the docs website - http://doc.bioperl.org/ Ask on-list bioperl-l@bioperl.org
  • 73. Fungal Genomics • ~45 fungal genomes currently sequenced • 5-10 more in production at sequencing centers • Diverse sampling across the fungal tree • Clusters of closely related species • Cryptococcus, Aspergillus, Saccharomyces, Candida
  • 74. Genome Annotation Pipeline Rfam tRNAscan ZFF to GFF3 GFF to AA SNAP FASTA predicted Proteins proteins Twinscan all-vs-all GFF2 to GFF3 Bio::DB::GFF Tools::Glimmer Glimmer SearchIO protein to Genscan Genome genome Tools::Genscan coordinates homologs HMMER MCL SearchIO SearchIO and BLASTZ orthologs BLASTN Tools::GFF GLEAN GFF2 to GFF3 exonerate (combiner) Proteins Gene protein2genome families exonerate ESTs est2genome BOSC 2005
  • 75. Genome Browsing BOSC 2005
  • 76. Comparative Genomics • Find orthologs, align, build trees • Assess gene structure among orthologs • Find gene families • Assess gene structure • Copy number among genomes BOSC 2005
  • 77. Comparative Genomics: Gene Structure Species phylogeny Bio::Tree/TreeIO Bio::DB Bio::AignIO Bio::SimpleAlign Bio::SeqIO MUSCLE Alignments Score shared orthologs Bio::Coordinate with Introns intorn positions on Tree BOSC 2005
  • 78. Phylogenomics Species phylogeny Bio::TreeIO gene MrBayes families Bio::DB PhyML Classify tree Bio::SeqIO Bio::AignIO MUSCLE differences ProtML PUZZLE Bin genes by orthologs topology BOSC 2005
  • 79. Arabidopsis thaliana Dictyostelium discoideum Rhizopus oryzae Ustilago maydis Basidiomycota Cryptococcus neoformans Cryptococcus neoformans Cryptococcus neoformans Cryptococcus neoformans Coprinus cinereus Phanerochaete chrysosporium Schizosaccharomyces pombe Fungi Sclerotinia sclerotiorum Ascomycota Botrytis cinerea Magnaporthe grisea Sordiomycota Neurospora crassa Podospora anserina Chaetomium globosum Trichoderma reesi Euasco Fusarium verticillioides Fusarium graminearum Stagonospora nodorum Aspergillus terreus Aspgergillus Aspergillus nidulans Aspergillus fumigatus Eurotiomycota Histoplasma capsulatum Onygenales Coccidioides immitis Uncinocarpus reesii Yarrowia lipolytica Kluyveromyces lactis Ashbya gossypii Hemiascomycota Opistikant Candida glabrata Saccharomyces castellii Saccharomyces bayanus Saccharomyces kudriavzevii Saccharomyces mikatae Saccharomyces paradoxus Saccharomyces cerevisiae Saccharomyces Complex Saccharomyces cerevisiae Saccharomyces cerevisiae Saccharomyces kluyveri Kluyveromyces waltii Candida tropicalis Candida albicans Candida dubliniensis Candida guilliermondii Debaryomyces hansenii Candida lusitaniae Caenorhabdtis elegans Nematoda Caenorhabdtis briggsae Apis mellifera Metazoa Insecta Drosophila melanogaster Coelomata... Diptera Anopholes gambiae Ciona intestinalis Xenopus tropicalis Vertebrate Danio rerio Fish Fugu rubripes Tetraodon nigroviridis Gallus gallus Canis familiaris Mammalia Rattus norvegicus Rodenta Mus musculus PrimateRodent Homo sapiens Primate Pan troglodytes
  • 80. Rhizopus oryzae Ustilago maydis Basidiomycota Cryptococcus neoformans Cryptococcus neoformans Cryptococcus neoformans Cryptococcus neoformans Coprinus cinereus Phanerochaete chrysosporium Schizosaccharomyces pombe Fungi Sclerotinia sclerotiorum Ascomycota Botrytis cinerea Magnaporthe grisea Sordiomycota Neurospora crassa Podospora anserina Chaetomium globosum Trichoderma reesi Euasco Fusarium verticillioides Fusarium graminearum Stagonospora nodorum Aspergillus terreus Aspgergillus Aspergillus nidulans Aspergillus fumigatus Eurotiomycota Histoplasma capsulatum Onygenales Coccidioides immitis Uncinocarpus reesii Yarrowia lipolytica Kluyveromyces lactis Ashbya gossypii Hemiascomycota Opistikant Candida glabrata Saccharomyces castellii Saccharomyces bayanus Saccharomyces kudriavzevii Saccharomyces mikatae Saccharomyces paradoxus Saccharomyces cerevisiae Saccharomyces Complex Saccharomyces cerevisiae Saccharomyces cerevisiae Saccharomyces kluyveri Kluyveromyces waltii Candida tropicalis Candida albicans Candida dubliniensis Candida guilliermondii Debaryomyces hansenii Candida lusitaniae
  • 81. Consensus Splice Site Exon Exon AG GT G H.sapiens exon|intron H.sapiens intron|exon 2 2 C bits bits 1 1 A A T T G A G G T C A C G T C T T T 0 0 C A A T G T C C A C G A G C -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 5′ 3′ 5′ GTATGT AG weblogo.berkeley.edu weblogo.berkele S.cerevisiae exon|intron 2 S.cerevisiae intron|exon 2 bits 1 bits 1 T A C C A T 0 G T A A A G C C A -2 -1 0 1 2 3 4 5 T 0 G C 5′ 3′ A C -5 -4 -3 -2 -1 0 1 5′ GT G AG weblogo.berkeley.edu weblogo.berkele F.graminearium exon|intron F.graminearium intron|exon 2 2 bits bits 1 1 AA T C T GT A T G T 0 0 C C A C T C A A C T A G C A G -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 5′ 3′ 5′ AG GT G weblogo.berkeley.edu weblogo.berkele C.neoformans exon|intron C.neoformans intron|exon 2 2 bits bits 1 1 T T AA C G C G C T 0 0 TT T A A A C A G G C C A -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 5′ 3′ 5′ weblogo.berkeley.edu weblogo.berkele
  • 82. 2290 Arabidopsis thaliana 2685 Fugu rubripes 2746 2656 Mus musculus 6699 2767 2772 - 2772 2735 Homo sapiens 2748 803.6 2769 947 Rhizopus oryzae 1498 2109 1648 86 Ustilago maydis 2428 1490 2064 Cryptococcus neoformans 1578 1614 958.6 1615 Phanerochaete chrysosporium 1781 1519.9 1621 Coprinopsis cinerea 1605 214 Schizosaccharomyces pombe 1507 372 Fusarium graminearum 1803 349.7 1492 368 Magnaporthe grisea Podospora anserina 339.4 359 378.6 463 Chaetomium globosum 1128 513 1803 388.4 336 Neurospora crassa 640 1141 509 403 Stagonospora nodorum Aspergillus nidulans 469 410.2 630 481 Aspergillus fumigatus 474.0 1184 476.0 474 Aspergillus terreus 972 30 Yarrowia lipolytica 5 Debaryomyces hansenii 27 29 Saccharomyces cerevisiae 7 7 28 7 6 Candida glabrata 7 6 Kluyveromyces lactis 7/6 7 Ashbya gossypii 0.1
  • 83. Intron loss • If loss predominates, how are introns lost? • Fink (1987) model proposes mRNA integration into genome. • Boeke et al (1985) showed loss intron through RNA intermediate which is integrated into genome. • Large scale comparisons are too far awy to determine recent loss events.
  • 84. Searching for Intron loss in C. neoformans • Average of 5 introns per gene (Loftus et al, 2005) • A Average Ks between var. grubii and var. gattii 0.22 D (roughly mouse-rat B/C divergence), C. gattii vs C. neoformans Ks is 0.35 • 5133 4-way orthologs
  • 85. Loss/gain events observed • 25 out of the 5133 loci had loss or gain events. • 2 had evidence 4 or more intron loss/gain events. • CNI01550, putative RNA helicase missing 10 introns in var grubii (serotype A). • ~5-7 singleton introns in one of the Cryptococcus spp indicative of intron gain.
  • 86. 10 introns lost in C. neoformans var grubii 2 3 5 6 78 9 10 11 12 13 1415 16 17 18 19 20 21 22 1 2 3 5 6 78 9-19 20 21 22 C.gattii var grubii var neoformans } }
  • 87. Happened at the locus and was a perfect deletion 1k 2k 3k 4k 5k 6k 7k 8k 9k 10k 11k CNI01560 CNI01540 CNI01550 C.neoformans var neoformans 89% nt identity 92% nt identity 86% nt identity C.neoformans var grubii 85% nt identity 88% nt identity 85% nt identity C.gattii
  • 88. Other Research Projects • Multigene phylogenies of the fungi • Gene family evolution • Sugar transporter family expansion in C.neoformans • Gene family evolution mammals (with M. Hahn)