SlideShare une entreprise Scribd logo
1  sur  68
Télécharger pour lire hors ligne
Formats de données
        en biologie
                       Pierre Poulain
    pierre.poulain@univ-paris-diderot.fr

                                09/2011
Menu
1    Rappels

2    Problématique

3    Séquences

4    Structures

5    Quelques précautions

6    Conclusion

7    Références & crédits graphiques
PP                    Université Paris Diderot - Paris 7   2
Menu
1    Rappels

2    Problématique

3    Séquences

4    Structures

5    Quelques précautions

6    Conclusion

7    Références & crédits graphiques
PP                    Université Paris Diderot - Paris 7   3
Dogme de la biologie


       ADN           ARN                              protéine




      transcription                      traduction




PP            Université Paris Diderot - Paris 7                 4
Menu
1    Rappels

2    Problématique

3    Séquences

4    Structures

5    Quelques précautions

6    Conclusion

7    Références & crédits graphiques
PP                    Université Paris Diderot - Paris 7   5
Expérimentalement
       ADN              ARN                               protéine
      A,T,C,G          A,U,C,G                           V,G,W,C...


     AAGATGACCGTGTGTCAT
     TTGATCCTGAACTGTTTG
     AAAAAATGTTCCGTGACG
     GACTCTTTGATGATGAGA
     CCTCGGAAGTAACGGAGC
     AGCGCAATGTTCCGTGAC
     CAGCTGACAATGTATCAG
     ATTCCAGACTGGATCAGA
     TCTGAATGCCATTAGCTT
PP                  Université Paris Diderot - Paris 7                6
TTGTCACCTGTACACTGGCATTACTACACAGAAACCCAGATGTCCGTTACC
 Séquences > structures
AAGATGACCGTGTGTCATTCATTCCTAAGATTCAAAATGATTTCGATGGCA
TTGATCCTGAACTGTTTGAATTGAGAAAAGCTGTTATGGACACCAATGAAA
AAAAAATGTTCCGTGACGACACTTTCGGCAAGAACCTGAATGCAAACACAA
GACTCTTTGATGATGAGACTAGTTCATCCTCTTTTAAGCAAAATTCCTCTC
CCTCGGAAGTAACGGAGCAACCTGTGCAACCAACCTCCGCTGTCATGGGTA
GCTTCTTGTCTCCACAGTACCAACGTGCGTCATCTGCTTCTCGTACTAATC
ATAATACAAGCACCTCCAGTTTAATGAAGCCTGAATCAAGTCTCTACCTGG
ATAAATCATATTCGCATTTTAACAACAACGGCAGCAACGAAAACGCCCGCA
CATATTTGTAATCCAATATATACTCACATGTAACAACTTATTATATAAATA
AAGGATATCCTACATTATATTTCATAGAAAACCGCTCAAAAAGGTGTATTA
CATCCCAACACCACACATATTTCAGCGATAAAAACCTTAAATGTGAAATTC
CTGCTTCCTTAAATGTACGCAATTGCCGCTTTTTTCTGACATCTTTTTTGA
AAGGAAACAGATCCTCCAGAAGGGATTTACTGTTGGCTATTTTGTGTTAGA
ATAATAGATTAGGTTGCGTAAGTCATGGTCGAAAATAGTACGCAGAAGGCC
GGAAATGATGATAATAGCTCTACCAAGCCATATTCGGAGGCGTTTTTCTTA
AACCCAACGCCTGGATTAGAAGCTGAGCACTCAAGCACATCGCCTGCCCCC
AACTTGAAAATCGGTATGCTATTATCAATGCTTTACAATTCTGTCGGTTAC
GAGGATCATTGCCCTCAAGGTGGCGAATATTCGGATTTATTGAGAAATTTG
  PP               Université Paris Diderot - Paris 7 7
TGTGAAGCTATTTTGCCATCTTACGAAATTATTGAACGCTACAAGAACCAC
Séquences > structures




PP          Université Paris Diderot - Paris 7   8
Séquences > structures




PP          Université Paris Diderot - Paris 7   9
Beaucoup de données
que vous manipulez
Menu
1    Rappels

2    Problématique

3    Séquences

4    Structures

5    Quelques précautions

6    Conclusion

7    Références & crédits graphiques
PP                   Université Paris Diderot - Paris 7   12
Séquences
                                nucléiques, protéiques




PP     Université Paris Diderot - Paris 7            13
Format Fasta

Le plus simple


>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY




PP                       Université Paris Diderot - Paris 7          14
Fasta


>en-tête
séquence   sur   80   caractères                 maximum    par   ligne
séquence   sur   80   caractères                 maximum    par   ligne
séquence   sur   80   caractères                 maximum    par   ligne
séquence   sur   80   caractères                 maximum    par   ligne
séquence   sur   80   carac




PP                     Université Paris Diderot - Paris 7                 15
Remarques

                   > colle en-tête

            longueur de chaque ligne fixée

     extensions .fasta, .seq, .fas, .fna, .faa

        Python : chaînes de caractères + listes
                    + (biopython)


PP                  Université Paris Diderot - Paris 7   16
Multifasta
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
>gi|134252438|gb|ABO64984.1| cytochrome b [Elephantulus rupestris]
TAFSSVTHICRDVNYGWLIRYLHANGASLFFICLFIHVGRGIYYGSYLYFETWNIGVILLFITMATAFMG
YVLPWGQMSFWGATVITNLLSAIPYIGTTLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMVHLLFL
HETGSNNPLGLVSDSDKIPFHPYYTIKDLLGVFAILILHLSLVLFSPDLLGDPDNYTPANPLNTPPHIKP
EWYFLFAYAILRSIPNKLGGVLALVLSILILIIFPLLHTSKQRSLMFRPISQCLFWVLVADLLTLTWIGG
QPVEHPYIIIGQLASILYFTIILVLMPIAGVIENHIIKL
>gi|157367467|gb|ABV45600.1| cytochrome b [Mammuthus primigenius]
MTHIRKSHPLLKIINKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTMTAFSSMSHIC
RDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSF
WGATVITNLFSAIPYIGTDLVEWIWGGFSVDKATLNRFFALHFILPFTMIALAGVHLTFLHETGSNNPLG
LTSDSDKIPFHPYYTIKDFLGLLILILLLLLLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAI
LRSVPNKLGGILALLLSILILGMMPLLHTSKHRSMMLRPLSQVLFWTLATDLLMLTWIGSQPVEHPYIII
GQMASILYFSIILAFLPIAGMIENYLIK




PP                       Université Paris Diderot - Paris 7          17
Bases de données de séquences
           primaires


       GenBank – EMBL – DDBJ




PP          Université Paris Diderot - Paris 7   18
GenBank   http://www.ncbi.nlm.nih.gov/
trypsine ?
trypsine !
Exemple
LOCUS         NM_001001317             940 bp    mRNA          linear    PRI 27-DEC-2010
DEFINITION    Homo sapiens trypsin X3 (TRYX3), mRNA.
ACCESSION     NM_001001317
VERSION       NM_001001317.2 GI:170650697
[...]

FEATURES               Location/Qualifiers
     source            1..940
                       /organism="Homo sapiens"
                       /mol_type="mRNA"
                       /db_xref="taxon:9606"
                       /chromosome="7"
                       /map="7q34"
     gene              1..940
                       /gene="TRYX3"
                       /gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540"
                       /note="trypsin X3"
                       /db_xref="GeneID:136541"
                       /db_xref="HPRD:15572"
[...]

ORIGIN
          1 aaggctggca aaaaggagac cagacaggag gcgtctgtag agatatcatg aacttcaact
         61 tagctttgtt ttccagagac tggagctaaa ctgggctttc aacatcatca tgaagtttat
[...]
        781 tgccaaaatt ttttactata taccctggat tgaaaatgta atccaaaata actgagctgt
        841 ggcagttgtg gaccatatga cacagcttgt ccccatcgtt cacctttaga attaaatata
        901 aattaactcc tcaaaaaaaa aaaaaaaaaa aaaaaaaaaa
//



PP                                  Université Paris Diderot - Paris 7                     22
Exemple
LOCUS         NM_001001317             940 bp    mRNA          linear    PRI 27-DEC-2010
DEFINITION
ACCESSION
VERSION
              Homo sapiens trypsin X3 (TRYX3), mRNA.
              NM_001001317
              NM_001001317.2 GI:170650697
                                                                                     en-tête
[...]

FEATURES               Location/Qualifiers
     source            1..940
                       /organism="Homo sapiens"
                       /mol_type="mRNA"
                       /db_xref="taxon:9606"
                       /chromosome="7"

     gene
                       /map="7q34"
                       1..940
                                                                                     features
                       /gene="TRYX3"
                       /gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540"
                       /note="trypsin X3"
                       /db_xref="GeneID:136541"
                       /db_xref="HPRD:15572"
[...]

ORIGIN
          1 aaggctggca aaaaggagac cagacaggag gcgtctgtag agatatcatg aacttcaact

[...]
         61 tagctttgtt ttccagagac tggagctaaa ctgggctttc aacatcatca tgaagtttat
                                                                                     séquence
        781 tgccaaaatt ttttactata taccctggat tgaaaatgta atccaaaata actgagctgt
        841 ggcagttgtg gaccatatga cacagcttgt ccccatcgtt cacctttaga attaaatata
        901 aattaactcc tcaaaaaaaa aaaaaaaaaa aaaaaaaaaa
//



PP                                  Université Paris Diderot - Paris 7                         23
En-tête
LOCUS       NM_001001317               940 bp     mRNA       linear      PRI 27-DEC-2010
                  |                      |         |                      |        |
                 nom                  taille    type de                division   date de
                                                molécule                          modification

ACCESSION   NM_001001317
                 |
                 numéro d'accession (unique et stable)

SOURCE      Homo sapiens (human)
                 |
                 nom de l'organisme

 ORGANISM   Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
            Catarrhini; Hominidae; Homo.
                 |
                taxonomie

REFERENCE   1 (bases 1 to 940)
  AUTHORS   Bubb,K.L., Bovee,D., Buckley,D., Haugen,E., Kibukawa,M.,
            Paddock,M., Palmieri,A., Subramanian,S., Zhou,Y., Kaul,R., Green,P.
            and Olson,M.V.
 TITLE      Scan of human genome reveals no new Loci under ancient balancing
            selection
 JOURNAL    Genetics 173 (4), 2165-2177 (2006)
  PUBMED    16751668
             |
            référence bibliographique


PP                                Université Paris Diderot - Paris 7                             24
Features
                    début et fin du gène
                     |         nom du gène
     gene            1..940     |
                     /gene="TRYX3"
                     /gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540"
                     /note="trypsin X3"
                     /db_xref="GeneID:136541"
                     /db_xref="HPRD:15572"
                                   |
                                  identifiants d'autres bases de données
 séquence codante    début et fin
      |               |
     CDS             110..835
                     /gene="TRYX3"
                     /gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540"
                     /EC_number="3.4.21.4"
                     /note="trypsin-X3"           nom de la protéine produite
                     /codon_start=1                |
                     /product="trypsin-X3 precursor"
                     /protein_id="NP_001001317.1"
                     /db_xref="GI:48255915"
                     /db_xref="CCDS:CCDS5871.1"
                     /db_xref="GeneID:136541"
                     /db_xref="HPRD:15572"
                     /translation="MKFILLWALLNLTVALAFNPDYTVSSTPPYLVYLKSDYLPCAGV
                     LIHPLWVITAAHCNLPKLRVILGVTIPADSNEKHLQVIGYEKMIHHPHFSVTSIDHDI
                     MLIKLKTEAELNDYVKLANLPYQTISENTMCSVSTWSYNVCDIYKEPDSLQTVNISVI
                     SKPQCRDAYKTYNITENMLCVGIVPGRRQPCKEVSAAPAICNGMLQGILSFADGCVLR
                     ADVGIYAKIFYYIPWIENVIQNN"
                        |
                       séquence de la protéine

PP                                Université Paris Diderot - Paris 7              25
Séquence

ORIGIN
       1   aaggctggca   aaaaggagac cagacaggag   gcgtctgtag     agatatcatg   aacttcaact
      61   tagctttgtt   ttccagagac tggagctaaa   ctgggctttc     aacatcatca   tgaagtttat
     121   cctcctctgg   gccctcttga atctgactgt   tgctttggcc     tttaatccag   attacacagt
     181   cagctccact   cccccttact tggtctattt   gaaatctgac     tacttgccct   gcgctggagt
     241   cctgatccac   ccgctttggg tgatcacagc   tgcacactgc     aatttaccaa   agcttcgggt
     301   gatattgggg   gttacaatcc cagcagactc   taatgaaaag     catctgcaag   tgattggcta
     361   tgagaagatg   attcatcatc cacacttctc   agtcacttct     attgatcatg   acatcatgct
     421   aatcaagctg   aaaacagagg ctgaactcaa   tgactatgtg     aaattagcca   acctgcccta
     481   ccaaactatc   tctgaaaata ccatgtgctc   tgtctctacc     tggagctaca   atgtgtgtga
     541   tatctacaaa   gagcccgatt cactgcaaac   tgtgaacatc     tctgtaatct   ccaagcctca
     601   gtgtcgcgat   gcctataaaa cctacaacat   cacggaaaat     atgctgtgtg   tgggcattgt
     661   gccaggaagg   aggcagccct gcaaggaagt   ttctgctgcc     ccggcaatct   gcaatgggat
     721   gcttcaagga   atcctgtctt ttgcggatgg   atgtgttttg     agagccgatg   ttggcatcta
     781   tgccaaaatt   ttttactata taccctggat   tgaaaatgta     atccaaaata   actgagctgt
     841   ggcagttgtg   gaccatatga cacagcttgt   ccccatcgtt     cacctttaga   attaaatata
     901   aattaactcc   tcaaaaaaaa aaaaaaaaaa   aaaaaaaaaa
//                            |
                             séquence du gène




PP                                   Université Paris Diderot - Paris 7                  26
Remarques
                      extension .gbk

                  visualisation : artemis
     http://www.sanger.ac.uk/resources/software/artemis/


              format EMBL (.embl) ∼ .gbk

          Python : chaînes de caractères/listes
               + expressions régulières


PP                     Université Paris Diderot - Paris 7   27
EMBL
ID    7    standard; DNA; HTG; 5916 BP.
AC    chromosome:GRCh37:7:141951963:141957878:-1
[...]
OS    Homo sapiens (human)
OC    Eukaryota; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia;
OC    Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi;
OC    Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria;
OC    Euarchontoglires; Primates; Haplorrhini; Simiiformes; Catarrhini;
OC    Hominoidea; Hominidae; Homininae; Homo.
[...]
FT    gene            1..5916
FT                    /gene=ENSG00000171147
FT                    /locus_tag="U66059.56"
FT                    /note="Trypsin-X3 Precursor (EC 3.4.21.4)
[...]
FT    CDS             join(352..391,2386..2524,2748..3004,5448..5587,5689..5838)
FT                    /gene="ENSESTG00000027201"
FT                    /protein_id="ENSESTP00000068598"
FT                    /note="transcript_id=ENSESTT00000068598"
FT                    /translation="MKFILLWALLNLTVALAFNPDYTVSSTPPYLVYLKSDYLPCAGVL
FT                    IHPLWVITAAHCNLPKLRVILGVTIPADSNEKHLQVIGYEKMIHHPHFSVTSIDHDIML
[...]
SQ    Sequence 5916 BP; 1714 A; 1266 C; 1022 G; 1914 T; 0 other;
     AAGGCTGGCA AAAAGGAGAC CAGACAGGAG GCGTCTGTAG AGATATCATG AACTTCAACT         60
     TAGCTTTGGT ACTTTCTTCC CTGAAGACAG AGGGCAGAAC TCTGAGTTCC AGAACCATTT        120
     TCAACTGTAT TGGGGACCAA TCACTTGACT CTATTCTTGT CTCTCTGACA GATGACGCTA        180
     CACTCTCCTC TGAATAATGG ACACCATTTC TAAAACTGAA TCCTGCTACT AAAATAATTC        240
[...]
     GTAATCCAAA ATAACTGAGC TGTGGCAGTT GTGGACCATA TGACACAGCT TGTCCCCATC       5880
     GTTCACCTTT AGAATTAAAT ATAAATTAAC TCCTCA                                 5916
//

PP                                Université Paris Diderot - Paris 7                28
Bases de données de séquences
          secondaires


      UniProt – Pfam – ProSite – ...




PP            Université Paris Diderot - Paris 7   29
UniProt   http://www.uniprot.org/
trypsine ?
trypsine !
Exemple
ID    TRY3_HUMAN              Reviewed;         304 AA.
AC    P35030; A9Z1Y4; P15951; Q15665; Q5VXV0; Q9UQV3;
DT    01-FEB-1994, integrated into UniProtKB/Swiss-Prot.
DT    14-OCT-2008, sequence version 2.
DT    11-JAN-2011, entry version 111.
DE    RecName: Full=Trypsin-3;
DE             EC=3.4.21.4;
DE    AltName: Full=Brain trypsinogen;
DE    AltName: Full=Mesotrypsinogen;
[...]
CC    -!- FUNCTION: Digestive protease specialized for the degradation of
CC        trypsin inhibitors.
CC    -!- CATALYTIC ACTIVITY: Preferential cleavage: Arg-|-Xaa, Lys-|-Xaa.
CC    -!- COFACTOR: Binds 1 calcium ion per subunit.
[...]
DR    PIR; S33496; S33496.
DR    RefSeq; NP_002762.2; NM_002771.3.
DR    UniGene; Hs.654513; -.
DR    PDB; 1H4W; X-ray; 1.70 A; A=81-304.
[...]
FT    DISULFID    196    263
FT    DISULFID    228    242
FT    DISULFID    253    277
[...]
SQ    SEQUENCE   304 AA; 32529 MW; 4C4303C310B7BFFC CRC64;
     MCGPDDRCPA RWPGPGRAVK CGKGLAAARP GRVERGGAQR GGAGLELHPL LGGRTWRAAR
     DADGCEALGT VAVPFDDDDK IVGGYTCEEN SLPYQVSLNS GSHFCGGSLI SEQWVVSAAH
     CYKTRIQVRL GEHNIKVLEG NEQFINAAKI IRHPKYNRDT LDNDIMLIKL SSPAVINARV
     STISLPTTPP AAGTECLISG WGNTLSFGAD YPDELKCLDA PVLTQAECKA SYPGKITNSM
     FCVGFLEGGK DSCQRDSGGP VVCNGQLQGV VSWGHGCAWK NRPGVYTKVY NYVDWIKDTI
     AANS
//

PP                                Université Paris Diderot - Paris 7         33
Détails
ID   TRY3_HUMAN                Reviewed;         304 AA.
      |                        |                   |
      nom                    origine : Swiss-Prot taille

DT   01-FEB-1994, integrated into UniProtKB/Swiss-Prot.
DT   14-OCT-2008, sequence version 2.
DT   11-JAN-2011, entry version 111.
      |
     dates d'entrée dans UniProt, de modification de la séquence, de modification de la fiche

DE   RecName: Full=Trypsin-3;
      |
     nom de la protéine

DE   AltName: Full=Brain trypsinogen;
DE   AltName: Full=Mesotrypsinogen;
DE   AltName: Full=Serine protease 3;
DE   AltName: Full=Serine protease 4;
DE   AltName: Full=Trypsin III;
      |
     noms alternatifs

OS   Homo sapiens (Human).
      |
     organisme

OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC   Catarrhini; Hominidae; Homo.
      |
     taxonomie


PP                                  Université Paris Diderot - Paris 7                      34
Détails (2)
RN   [1]
RP   NUCLEOTIDE SEQUENCE [MRNA] (ISOFORMS A AND B), AND VARIANT ALA-188.
RC   TISSUE=Brain;
RX   MEDLINE=94123994; PubMed=8294000; DOI=10.1016/0378-1119(93)90460-K;
RA   Wiegand U., Corbach S., Minn A., Kang J., Mueller-Hill B.;
RT   "Cloning of the cDNA encoding human brain trypsinogen and
RT   characterization of its product.";
RL   Gene 136:167-175(1993).
      |
     référence bibliographique

CC   -!- FUNCTION: Digestive protease specialized for the degradation of
CC       trypsin inhibitors.
CC   -!- CATALYTIC ACTIVITY: Preferential cleavage: Arg-|-Xaa, Lys-|-Xaa.
CC   -!- COFACTOR: Binds 1 calcium ion per subunit.
CC   -!- SUBCELLULAR LOCATION: Secreted.
      |
     annotations (fonction, localisation)

DR   PIR; S12764; S12764.
DR   PIR; S33496; S33496.
DR   RefSeq; NP_002762.2; NM_002771.3.
DR   UniGene; Hs.654513; -.
      |
     identifiants d'autres bases de données

PE    1: Evidence at protein level;
      |
     degré de confiance de l'existence (expression) de la protéine



PP                                 Université Paris Diderot - Paris 7       35
Détails (3)
FT    MOD_RES      211    211       Sulfotyrosine (By similarity).
FT    DISULFID      87    217
FT    DISULFID     105    121
[...]
FT    STRAND       111    117
FT    HELIX        119    121
       |
     annotations   de la séquence

SQ   SEQUENCE   304 AA; 32529 MW; 4C4303C310B7BFFC CRC64;
     MCGPDDRCPA RWPGPGRAVK CGKGLAAARP GRVERGGAQR GGAGLELHPL          LGGRTWRAAR
     DADGCEALGT VAVPFDDDDK IVGGYTCEEN SLPYQVSLNS GSHFCGGSLI          SEQWVVSAAH
     CYKTRIQVRL GEHNIKVLEG NEQFINAAKI IRHPKYNRDT LDNDIMLIKL          SSPAVINARV
     STISLPTTPP AAGTECLISG WGNTLSFGAD YPDELKCLDA PVLTQAECKA          SYPGKITNSM
     FCVGFLEGGK DSCQRDSGGP VVCNGQLQGV VSWGHGCAWK NRPGVYTKVY          NYVDWIKDTI
     AANS
      |
     séquence de la protéine

//
 |
fin de la fiche




PP                                  Université Paris Diderot - Paris 7            36
Remarques

               extension .txt


               également .xml


     Python : chaînes de caractères/listes
          + expressions régulières
                (+ module xml)



PP              Université Paris Diderot - Paris 7   37
xml
<?xml version='1.0' encoding='UTF-8'?>
<uniprot xmlns="http://uniprot.org/uniprot" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<entry dataset="Swiss-Prot" created="1994-02-01" modified="2011-01-11" version="111">
<accession>P35030</accession>
<accession>A9Z1Y4</accession>
<accession>P15951</accession>
<accession>Q15665</accession>
[...]
<dbReference type="NCBI Taxonomy" id="9606" key="2"/>
<lineage>
<taxon>Eukaryota</taxon>
<taxon>Metazoa</taxon>
<taxon>Chordata</taxon>
[...]
<feature type="disulfide bond">
<location>
<begin position="228"/>
<end position="242"/>
[...]
<feature type="strand">
<location>
<begin position="133"/>
<end position="137"/>
[...]
<sequence length="304" mass="32529" checksum="4C4303C310B7BFFC" modified="2008-10-14" version="2"
MCGPDDRCPARWPGPGRAVKCGKGLAAARPGRVERGGAQRGGAGLELHPLLGGRTWRAAR
DADGCEALGTVAVPFDDDDKIVGGYTCEENSLPYQVSLNSGSHFCGGSLISEQWVVSAAH
CYKTRIQVRLGEHNIKVLEGNEQFINAAKIIRHPKYNRDTLDNDIMLIKLSSPAVINARV
STISLPTTPPAAGTECLISGWGNTLSFGADYPDELKCLDAPVLTQAECKASYPGKITNSM
FCVGFLEGGKDSCQRDSGGPVVCNGQLQGVVSWGHGCAWKNRPGVYTKVYNYVDWIKDTI
AANS
</sequence>

PP                                Université Paris Diderot - Paris 7                        38
Menu
1    Rappels

2    Problématique

3    Séquences

4    Structures

5    Quelques précautions

6    Conclusion

7    Références & crédits graphiques
PP                   Université Paris Diderot - Paris 7   39
Protein Data Bank (PDB)

       structures : ADN, ARN, protéines, virus...


     Rayons-X, RMN, cryo-microscopie électronique




PP                   Université Paris Diderot - Paris 7   40
PDB   http://www.rcsb.org/pdb/home/home.do
trypsine ?
trypsine !
Exemple
HEADER     HYDROLASE (SERINE PROTEINASE)           26-OCT-81   2PTN
TITLE      ON THE DISORDERED ACTIVATION DOMAIN IN TRYPSINOGEN.
TITLE     2 CHEMICAL LABELLING AND LOW-TEMPERATURE CRYSTALLOGRAPHY
COMPND     MOL_ID: 1;
COMPND    2 MOLECULE: TRYPSIN;
COMPND    3 CHAIN: A;
COMPND    4 EC: 3.4.21.4;
COMPND    5 ENGINEERED: YES
SOURCE     MOL_ID: 1;
SOURCE    2 ORGANISM_SCIENTIFIC: BOS TAURUS;
SOURCE    3 ORGANISM_COMMON: CATTLE;
SOURCE    4 ORGANISM_TAXID: 9913
KEYWDS     HYDROLASE (SERINE PROTEINASE)
EXPDTA     X-RAY DIFFRACTION
[...]
REMARK    2 RESOLUTION.         1.55 ANGSTROMS.
[...]
[...]
ATOM     273   N    ALA   A   55       6.294   11.611     25.982    1.00     9.30   N
ATOM     274   CA   ALA   A   55       6.778   12.670     25.099    1.00     9.30   C
ATOM     275   C    ALA   A   55       7.329   13.864     25.883    1.00     9.30   C
ATOM     276   O    ALA   A   55       6.747   14.218     26.934    1.00     9.30   O
ATOM     277   CB   ALA   A   55       5.636   13.154     24.190    1.00     9.30   C
ATOM     278   N    ALA   A   56       8.461   14.383     25.454    1.00     7.97   N
ATOM     279   CA   ALA   A   56       9.069   15.522     26.129    1.00     7.97   C
ATOM     280   C    ALA   A   56       8.143   16.740     26.167    1.00     7.97   C
ATOM     281   O    ALA   A   56       8.162   17.496     27.169    1.00     7.97   O
ATOM     282   CB   ALA   A   56      10.414   15.918     25.506    1.00     7.97   C
[...]



PP                                      Université Paris Diderot - Paris 7              44
PDB


     en-tête
——————–
               ©
 coordonnées
               ©
PP             Université Paris Diderot - Paris 7   45
Coordonnées

     PyMOL
     Rasmol
     VMD
     ...


     Python

PP            Université Paris Diderot - Paris 7   46
Coordonnées
ATOM   601   N     LEU   A   99   10.007   19.687     17.536    1.00     12.25   N
ATOM   602   CA    LEU   A   99    9.599   18.429     18.188    1.00     12.25   C
ATOM   603   C     LEU   A   99   10.565   17.281     17.914    1.00     12.25   C
ATOM   604   O     LEU   A   99   10.256   16.101     18.215    1.00     12.25   O
ATOM   605   CB    LEU   A   99    8.149   18.040     17.853    1.00     12.25   C
ATOM   606   CG    LEU   A   99    7.125   19.029     18.438    1.00     18.18   C
ATOM   607   CD1   LEU   A   99    5.695   18.554     18.168    1.00     18.18   C
ATOM   608   CD2   LEU   A   99    7.323   19.236     19.952    1.00     18.18   C




PP                                  Université Paris Diderot - Paris 7               47
PP   Université Paris Diderot - Paris 7   48
Remarques

                     plusieurs chaînes


                plusieurs structures (RMN)


                      des trous (RX)


     Python : chaînes de caractères (tranches) + listes



PP                    Université Paris Diderot - Paris 7   49
Plusieurs chaînes


ATOM   955   CD2   TYR   A   117   28.547   16.730     59.818    1.00     34.54   C
ATOM   956   CE1   TYR   A   117   26.512   14.828     59.696    1.00     34.81   C
ATOM   957   CE2   TYR   A   117   28.117   16.089     60.985    1.00     35.96   C
ATOM   958   CZ    TYR   A   117   27.100   15.139     60.917    1.00     35.42   C
ATOM   959   OH    TYR   A   117   26.673   14.515     62.069    1.00     37.14   O
ATOM   960   OXT   TYR   A   117   25.735   19.061     58.351    1.00     32.81   O
TER    961         TYR   A   117
ATOM   962   N     ARG   B     3   42.047   55.053     18.876    1.00     34.90   N
ATOM   963   CA    ARG   B     3   42.680   56.307     19.383    1.00     35.03   C
ATOM   964   C     ARG   B     3   43.365   56.041     20.722    1.00     33.56   C
ATOM   965   O     ARG   B     3   42.720   55.647     21.691    1.00     33.47   O
ATOM   966   CB    ARG   B     3   41.614   57.395     19.562    1.00     37.48   C
ATOM   967   CG    ARG   B     3   40.638   57.499     18.394    1.00     41.05   C




PP                                   Université Paris Diderot - Paris 7               50
Plusieurs structures
MODEL          1
ATOM       1   N     GLY   A    1   11.935 -10.938       0.352    1.00     0.00   N
ATOM       2   CA    GLY   A    1   13.344 -10.643       0.600    1.00     0.00   C
ATOM       3   C     GLY   A    1   13.861 -9.576       -0.330    1.00     0.00   C
ATOM       4   O     GLY   A    1   14.929 -9.728       -0.931    1.00     0.00   O
[...]
ATOM     934   HB2   GLU   A   60    9.981    7.744      1.905    1.00     0.00   H
ATOM     935   HB3   GLU   A   60   10.321    6.103      2.451    1.00     0.00   H
ATOM     936   HG2   GLU   A   60   12.152    6.972      3.824    1.00     0.00   H
ATOM     937   HG3   GLU   A   60   11.700    8.597      3.310    1.00     0.00   H
TER      938         GLU   A   60
ENDMDL
MODEL          2
ATOM       1   N     GLY   A    1   19.334   -6.988      0.864    1.00     0.00   N
ATOM       2   CA    GLY   A    1   18.296   -6.813      1.874    1.00     0.00   C
ATOM       3   C     GLY   A    1   18.000   -5.370      2.142    1.00     0.00   C
ATOM       4   O     GLY   A    1   18.677   -4.724      2.959    1.00     0.00   O
[...]
ATOM     934   HB2   GLU   A   60   11.353    9.615     -0.439    1.00     0.00   H
ATOM     935   HB3   GLU   A   60   13.095    9.643     -0.204    1.00     0.00   H
ATOM     936   HG2   GLU   A   60   13.380   10.930     -2.203    1.00     0.00   H
ATOM     937   HG3   GLU   A   60   11.654   10.817     -2.534    1.00     0.00   H
TER      938         GLU   A   60
ENDMDL




PP                                    Université Paris Diderot - Paris 7              51
Des trous
[...]
ATOM    7568   CB    LYS   B   72   -59.462-109.221    -72.440     1.00     31.64   C
ATOM    7569   CG    LYS   B   72   -58.524-109.915    -73.424     1.00     31.85   C
ATOM    7570   CD    LYS   B   72   -58.889-109.602    -74.868     1.00     32.02   C
ATOM    7571   CE    LYS   B   72   -58.174-110.533    -75.837     1.00     31.61   C
ATOM    7572   NZ    LYS   B   72   -58.629-110.335    -77.242     1.00     31.27   N
ATOM    7573   N     GLY   B   73   -61.309-106.416    -72.158     1.00     31.85   N
ATOM    7574   CA    GLY   B   73   -62.485-105.832    -71.510     1.00     30.84   C
ATOM    7575   C     GLY   B   73   -63.598-106.848    -71.303     1.00     29.65   C
ATOM    7576   O     GLY   B   73   -64.660-106.750    -71.920     1.00     28.85   O
ATOM    7577   N     SER   B   74   -63.354-107.820    -70.425     1.00     28.53   N
ATOM    7578   CA    SER   B   74   -64.301-108.911    -70.179     1.00     27.75   C
ATOM    7579   C     SER   B   74   -64.180-109.438    -68.754     1.00     26.72   C
ATOM    7580   O     SER   B   74   -65.113-110.041    -68.227     1.00     24.48   O
ATOM    7581   CB    SER   B   74   -64.070-110.058    -71.166     1.00     26.32   C
ATOM    7582   OG    SER   B   74   -64.505-109.716    -72.470     1.00     25.54   O
ATOM    7583   N     GLN   B   79   -62.682-105.888    -62.336     1.00     42.85   N
ATOM    7584   CA    GLN   B   79   -63.246-104.902    -63.248     1.00     42.57   C
ATOM    7585   C     GLN   B   79   -62.146-104.278    -64.103     1.00     42.60   C
ATOM    7586   O     GLN   B   79   -60.992-104.191    -63.681     1.00     42.45   O
ATOM    7587   CB    GLN   B   79   -63.996-103.819    -62.464     1.00     42.46   C
ATOM    7588   CG    GLN   B   79   -64.950-102.964    -63.300     1.00     42.30   C
ATOM    7589   CD    GLN   B   79   -66.093-103.764    -63.905     1.00     42.15   C
ATOM    7590   OE1   GLN   B   79   -66.388-104.879    -63.472     1.00     42.18   O
ATOM    7591   NE2   GLN   B   79   -66.743-103.194    -64.911     1.00     41.70   N
ATOM    7592   N     VAL   B   80   -62.514-103.846    -65.305     1.00     42.30   N
ATOM    7593   CA    VAL   B   80   -61.549-103.342    -66.275     1.00     42.03   C
ATOM    7594   C     VAL   B   80   -60.882-102.055    -65.796     1.00     42.42   C
ATOM    7595   O     VAL   B   80   -61.544-101.165    -65.260     1.00     43.09   O
[...]


PP                                     Université Paris Diderot - Paris 7               52
Menu
1    Rappels

2    Problématique

3    Séquences

4    Structures

5    Quelques précautions

6    Conclusion

7    Références & crédits graphiques
PP                   Université Paris Diderot - Paris 7   53
Quelques précautions
                            restez prudents / données




PP         Université Paris Diderot - Paris 7       54
GenBank Z71230
LOCUS         Z71230                   124 bp    DNA     linear   PLN 14-NOV-2006
DEFINITION    Nicotiana tabacum chloroplast JLA region, sequence 2.
ACCESSION     Z71230
VERSION       Z71230.1 GI:1279604
KEYWORDS      rpl2 gene; transfer RNA-His; trnH gene.
SOURCE        chloroplast Nicotiana tabacum (common tobacco)
  ORGANISM    Nicotiana tabacum
              Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
              Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons;
              asterids; lamiids; Solanales; Solanaceae; Nicotianoideae;
              Nicotianeae; Nicotiana.
REFERENCE     1 (bases 1 to 124)
  AUTHORS     Goulding,S.E., Olmstead,R.G., Morden,C.W. and Wolfe,K.H.
  TITLE       Ebb and flow of the chloroplast inverted repeat
  JOURNAL     Mol. Gen. Genet. 252 (1-2), 195-206 (1996)
   PUBMED     8804393
[...]
FEATURES              Location/Qualifiers
     source           1..124
                      /organism="Nicotiana tabacum"
                      /organelle="plastid:chloroplast"
                      /mol_type="genomic DNA"
                      /isolate="Cuban cahibo cigar, gift from President Fidel
                      Castro"
                      /db_xref="taxon:4097"
     gene             <1..11
                      /gene="rpl2"




PP                                  Université Paris Diderot - Paris 7              55
GenBank NC_001610
LOCUS         NC_001610              17084 bp    DNA     circular MAM 14-APR-2009
DEFINITION    Didelphis virginiana mitochondrion, complete genome.
ACCESSION     NC_001610
VERSION       NC_001610.1 GI:5835037
DBLINK        Project: 11806
KEYWORDS      .
SOURCE        mitochondrion Didelphis virginiana (North American opossum)
  ORGANISM    Didelphis virginiana
              Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
              Mammalia; Metatheria; Didelphimorphia; Didelphidae; Didelphis.
REFERENCE     1 (bases 1 to 17084)
  AUTHORS     Janke,A., Feldmaier-Fuchs,G., Thomas,W.K., von Haeseler,A. and
              Paabo,S.
  TITLE       The marsupial mitochondrial genome and the evolution of placental
              mammals
  JOURNAL     Genetics 137 (1), 243-256 (1994)
   PUBMED     8056314
[...]
FEATURES              Location/Qualifiers
     source           1..17084
                      /organism="Didelphis virginiana"
                      /organelle="mitochondrion"
                      /mol_type="genomic DNA"
                      /isolate="fresh road killed individual"
                      /db_xref="taxon:9267"
                      /tissue_type="liver"
                      /dev_stage="adult"




PP                                  Université Paris Diderot - Paris 7              56
GenBank 252544
LOCUS         252544                   649 bp    RNA     linear   VRL 19-SEP-2002
DEFINITION    gene 7 3' end, 5' end, segment 7 [human rotavirus, strain Wa,
              Genomic RNA, 425 nt 2 segments].
ACCESSION
VERSION         GI:252544
KEYWORDS      .
SOURCE        Human rotavirus A
  ORGANISM    Human rotavirus A
              Viruses; dsRNA viruses; Reoviridae; Sedoreovirinae; Rotavirus;
              Rotavirus A.
[...]
FEATURES                 Location/Qualifiers
     source              1..649
                         /organism="Human rotavirus A"
                         /mol_type="genomic RNA"
                         /strain="Wa"
                         /db_xref="taxon:10941"
ORIGIN
        1   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn     nnnnnnnnnn   nnnnnnnnnn
       61   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn     nnnnnnnnnn   nnnnnnnnnn
      121   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn     nnnnnnnnnn   nnnnnnnnnn
      181   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn     nnnnnnnnnn   nnnnnnnnnn
      241   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn     nnnnnnnnnn   nnnnnnnnnn
      301   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn     nnnnnnnnnn   nnnnnnnnnn
      361   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn     nnnnnnnnnn   nnnnnnnnnn
      421   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn     nnnnnnnnnn   nnnnnnnnnn
      481   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn     nnnnnnnnnn   nnnnnnnnnn
      541   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn     nnnnnnnnnn   nnnnnnnnnn
      601   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn   nnnnnnnnnn     nnnnnnnnn
//


PP                                      Université Paris Diderot - Paris 7                  57
PDB 7GBP, chaîne D, res 67




PP          Université Paris Diderot - Paris 7
                                                 Oups !
                                                      58
Menu
1    Rappels

2    Problématique

3    Séquences

4    Structures

5    Quelques précautions

6    Conclusion

7    Références & crédits graphiques
PP                   Université Paris Diderot - Paris 7   59
TTGTCACCTGTACACTGGCATTACTACACAGAAACCCAGATGTCCGTTACC
AAGATGACCGTGTGTCATTCATTCCTAAGATTCAAAATGATTTCGATGGCA
TTGATCCTGAACTGTTTGAATTGAGAAAAGCTGTTATGGACACCAATGAAA
données : séquences, structures...
AAAAAATGTTCCGTGACGACACTTTCGGCAAGAACCTGAATGCAAACACAA
GACTCTTTGATGATGAGACTAGTTCATCCTCTTTTAAGCAAAATTCCTCTC
CCTCGGAAGTAACGGAGCAACCTGTGCAACCAACCTCCGCTGTCATGGGTA
GCTTCTTGTCTCCACAGTACCAACGTGCGTCATCTGCTTCTCGTACTAATC
ATAATACAAGCACCTCCAGTTTAATGAAGCCTGAATCAAGTCTCTACCTGG
ATAAATCATATTCGCATTTTAACAACAACGGCAGCAACGAAAACGCCCGCA
CATATTTGTAATCCAATATATACTCACATGTAACAACTTATTATATAAATA
AAGGATATCCTACATTATATTTCATAGAAAACCGCTCAAAAAGGTGTATTA
CATCCCAACACCACACATATTTCAGCGATAAAAACCTTAAATGTGAAATTC
CTGCTTCCTTAAATGTACGCAATTGCCGCTTTTTTCTGACATCTTTTTTGA
AAGGAAACAGATCCTCCAGAAGGGATTTACTGTTGGCTATTTTGTGTTAGA
ATAATAGATTAGGTTGCGTAAGTCATGGTCGAAAATAGTACGCAGAAGGCC
GGAAATGATGATAATAGCTCTACCAAGCCATATTCGGAGGCGTTTTTCTTA
AACCCAACGCCTGGATTAGAAGCTGAGCACTCAAGCACATCGCCTGCCCCC
AACTTGAAAATCGGTATGCTATTATCAATGCTTTACAATTCTGTCGGTTAC
GAGGATCATTGCCCTCAAGGTGGCGAATATTCGGATTTATTGAGAAATTTG
TGTGAAGCTATTTTGCCATCTTACGAAATTATTGAACGCTACAAGAACCAC
formats – informations
il existe des normes




        ... pas toujours respectées
réfléchissez aux objets que vous manipulez
PP   Université Paris Diderot - Paris 7   64
Menu
1    Rappels

2    Problématique

3    Séquences

4    Structures

5    Quelques précautions

6    Conclusion

7    Références & crédits graphiques
PP                   Université Paris Diderot - Paris 7   65
Références
Cours de J.-C. Gelly Bases de données en biologie



Bioinformatics for dummies de J.-M. Claverie et C. Notredame



BioStar
Incorrect / unusual entries in main databases (GenBank, UniProt, PDB) ?
http://biostar.stackexchange.com/questions/10869/
incorrect-unusual-entries-in-main-databases-genbank-uniprot-pdb




PP                                Université Paris Diderot - Paris 7      66
Références (2)
format FASTA – http://en.wikipedia.org/wiki/FASTA_format


GenBank – http://www.ncbi.nlm.nih.gov/
format :
http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html


UniProt – http://www.uniprot.org/
format : http://www.uniprot.org/manual/


PDB – http://www.rcsb.org/pdb/home/home.do
format :
http://www.wwpdb.org/documentation/format23/v2.3.html




PP                    Université Paris Diderot - Paris 7   67
Crédits graphiques
      Squidonius (Wikimedia)

      Ralphbijker (Flickr)

      USDA/ARS
     Viktorvoigt (Wikimedia)
     Icons-Land (Findicons)

      herzogbr (Flickr)

      Icons-Land (Findicons)

     PAPYRARRI (Flickr)


PP                        Université Paris Diderot - Paris 7   68

Contenu connexe

En vedette

Cours docking gros grain
Cours docking gros grainCours docking gros grain
Cours docking gros grainpierrepo
 
Gestion de projets en bioinformatique
Gestion de projets en bioinformatiqueGestion de projets en bioinformatique
Gestion de projets en bioinformatiquepierrepo
 
attitude professionnelle
attitude professionnelleattitude professionnelle
attitude professionnellepierrepo
 
Cours préparation au monde professionnel
Cours préparation au monde professionnelCours préparation au monde professionnel
Cours préparation au monde professionnelpierrepo
 
Cours veille scientifique
Cours veille scientifiqueCours veille scientifique
Cours veille scientifiquepierrepo
 
Cours communication scientifique
Cours communication scientifiqueCours communication scientifique
Cours communication scientifiquepierrepo
 
La population. pablo hidalgo 6ºa
La population. pablo hidalgo 6ºaLa population. pablo hidalgo 6ºa
La population. pablo hidalgo 6ºajlealleon
 
Magret de canard
Magret de canardMagret de canard
Magret de canardanthonyTETU
 
Artes gráficas xilografía
Artes gráficas  xilografíaArtes gráficas  xilografía
Artes gráficas xilografíaMariajoacosta
 
Congrès ABF 2014 - Les frontières du métier : Bibliothèques et métiers voisi...
Congrès ABF 2014  - Les frontières du métier : Bibliothèques et métiers voisi...Congrès ABF 2014  - Les frontières du métier : Bibliothèques et métiers voisi...
Congrès ABF 2014 - Les frontières du métier : Bibliothèques et métiers voisi...Association des Bibliothécaires de France
 
La population. raúl idáñez
La population. raúl idáñezLa population. raúl idáñez
La population. raúl idáñezjlealleon
 
Examen Practico Segundo 1 Turno Vespertino
Examen Practico Segundo 1 Turno VespertinoExamen Practico Segundo 1 Turno Vespertino
Examen Practico Segundo 1 Turno Vespertinoostrujibar ostrujibar
 
Simplemente Venus 6ta parte by Rubido 9
Simplemente Venus 6ta parte by Rubido 9Simplemente Venus 6ta parte by Rubido 9
Simplemente Venus 6ta parte by Rubido 9MoonShadow13
 
Serenitis by Rubido9
Serenitis by Rubido9Serenitis by Rubido9
Serenitis by Rubido9MoonShadow13
 
Pràctica laboratori 1r eso
Pràctica laboratori 1r esoPràctica laboratori 1r eso
Pràctica laboratori 1r esopujardocuments
 
Memorámdun sonsonate 11 octubre 2010
Memorámdun sonsonate 11 octubre 2010Memorámdun sonsonate 11 octubre 2010
Memorámdun sonsonate 11 octubre 2010Adalberto
 
Pat. Laubertie Une Saison Exceptionnelle
Pat. Laubertie Une Saison ExceptionnellePat. Laubertie Une Saison Exceptionnelle
Pat. Laubertie Une Saison ExceptionnelleELANUSSEL
 

En vedette (20)

Cours docking gros grain
Cours docking gros grainCours docking gros grain
Cours docking gros grain
 
Gestion de projets en bioinformatique
Gestion de projets en bioinformatiqueGestion de projets en bioinformatique
Gestion de projets en bioinformatique
 
attitude professionnelle
attitude professionnelleattitude professionnelle
attitude professionnelle
 
Cours préparation au monde professionnel
Cours préparation au monde professionnelCours préparation au monde professionnel
Cours préparation au monde professionnel
 
Cours veille scientifique
Cours veille scientifiqueCours veille scientifique
Cours veille scientifique
 
Cours communication scientifique
Cours communication scientifiqueCours communication scientifique
Cours communication scientifique
 
La population. pablo hidalgo 6ºa
La population. pablo hidalgo 6ºaLa population. pablo hidalgo 6ºa
La population. pablo hidalgo 6ºa
 
Magret de canard
Magret de canardMagret de canard
Magret de canard
 
Artes gráficas xilografía
Artes gráficas  xilografíaArtes gráficas  xilografía
Artes gráficas xilografía
 
Congrès ABF 2014 - Les frontières du métier : Bibliothèques et métiers voisi...
Congrès ABF 2014  - Les frontières du métier : Bibliothèques et métiers voisi...Congrès ABF 2014  - Les frontières du métier : Bibliothèques et métiers voisi...
Congrès ABF 2014 - Les frontières du métier : Bibliothèques et métiers voisi...
 
RIA
RIARIA
RIA
 
La population. raúl idáñez
La population. raúl idáñezLa population. raúl idáñez
La population. raúl idáñez
 
Examen Practico Segundo 1 Turno Vespertino
Examen Practico Segundo 1 Turno VespertinoExamen Practico Segundo 1 Turno Vespertino
Examen Practico Segundo 1 Turno Vespertino
 
29 Ilusion
29 Ilusion29 Ilusion
29 Ilusion
 
Simplemente Venus 6ta parte by Rubido 9
Simplemente Venus 6ta parte by Rubido 9Simplemente Venus 6ta parte by Rubido 9
Simplemente Venus 6ta parte by Rubido 9
 
Serenitis by Rubido9
Serenitis by Rubido9Serenitis by Rubido9
Serenitis by Rubido9
 
Signature email Crossware
Signature email Crossware Signature email Crossware
Signature email Crossware
 
Pràctica laboratori 1r eso
Pràctica laboratori 1r esoPràctica laboratori 1r eso
Pràctica laboratori 1r eso
 
Memorámdun sonsonate 11 octubre 2010
Memorámdun sonsonate 11 octubre 2010Memorámdun sonsonate 11 octubre 2010
Memorámdun sonsonate 11 octubre 2010
 
Pat. Laubertie Une Saison Exceptionnelle
Pat. Laubertie Une Saison ExceptionnellePat. Laubertie Une Saison Exceptionnelle
Pat. Laubertie Une Saison Exceptionnelle
 

Similaire à Formats de données en biologie

Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012gregcaporaso
 
The introduction of supernova system: a vector system for single-cell labelin...
The introduction of supernova system: a vector system for single-cell labelin...The introduction of supernova system: a vector system for single-cell labelin...
The introduction of supernova system: a vector system for single-cell labelin...Div. of Neurogenet., NIG
 
Homo sapiens (human pepsin) NCBI GENBANK
Homo sapiens (human pepsin) NCBI GENBANKHomo sapiens (human pepsin) NCBI GENBANK
Homo sapiens (human pepsin) NCBI GENBANKShreyaBhatt23
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Linked Data for integrating life-science databases
Linked Data for integrating life-science databasesLinked Data for integrating life-science databases
Linked Data for integrating life-science databasesShuichi Kawashima
 
Bay Scallop Genetic Resources and Applications
Bay Scallop Genetic Resources and ApplicationsBay Scallop Genetic Resources and Applications
Bay Scallop Genetic Resources and Applicationssr320
 
Esa 2014 qiime
Esa 2014 qiimeEsa 2014 qiime
Esa 2014 qiimeZech Xu
 
Isolation and Genomic Analysis of Vorrps
Isolation and Genomic Analysis of VorrpsIsolation and Genomic Analysis of Vorrps
Isolation and Genomic Analysis of VorrpsLeslie Sterling
 
A Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with HypertableA Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with HypertableDATAVERSITY
 
Name- Date- Parlod- Monster Synthesis Activity Eurpase To examine how.pdf
Name- Date- Parlod- Monster Synthesis Activity Eurpase To examine how.pdfName- Date- Parlod- Monster Synthesis Activity Eurpase To examine how.pdf
Name- Date- Parlod- Monster Synthesis Activity Eurpase To examine how.pdfactexerode
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSGeorge Papadatos
 

Similaire à Formats de données en biologie (20)

第2回LinkedData勉強会@yayamamo
第2回LinkedData勉強会@yayamamo第2回LinkedData勉強会@yayamamo
第2回LinkedData勉強会@yayamamo
 
Advanced NCBI
Advanced NCBI Advanced NCBI
Advanced NCBI
 
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012
 
Submitted sequence (strains)
Submitted sequence (strains)Submitted sequence (strains)
Submitted sequence (strains)
 
Thesis biobix
Thesis biobixThesis biobix
Thesis biobix
 
The introduction of supernova system: a vector system for single-cell labelin...
The introduction of supernova system: a vector system for single-cell labelin...The introduction of supernova system: a vector system for single-cell labelin...
The introduction of supernova system: a vector system for single-cell labelin...
 
Bioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databasesBioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databases
 
Homo sapiens (human pepsin) NCBI GENBANK
Homo sapiens (human pepsin) NCBI GENBANKHomo sapiens (human pepsin) NCBI GENBANK
Homo sapiens (human pepsin) NCBI GENBANK
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Linked Data for integrating life-science databases
Linked Data for integrating life-science databasesLinked Data for integrating life-science databases
Linked Data for integrating life-science databases
 
Bioinformatics t2-databases v2014
Bioinformatics t2-databases v2014Bioinformatics t2-databases v2014
Bioinformatics t2-databases v2014
 
Bay Scallop Genetic Resources and Applications
Bay Scallop Genetic Resources and ApplicationsBay Scallop Genetic Resources and Applications
Bay Scallop Genetic Resources and Applications
 
Esa 2014 qiime
Esa 2014 qiimeEsa 2014 qiime
Esa 2014 qiime
 
Isolation and Genomic Analysis of Vorrps
Isolation and Genomic Analysis of VorrpsIsolation and Genomic Analysis of Vorrps
Isolation and Genomic Analysis of Vorrps
 
A Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with HypertableA Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with Hypertable
 
Name- Date- Parlod- Monster Synthesis Activity Eurpase To examine how.pdf
Name- Date- Parlod- Monster Synthesis Activity Eurpase To examine how.pdfName- Date- Parlod- Monster Synthesis Activity Eurpase To examine how.pdf
Name- Date- Parlod- Monster Synthesis Activity Eurpase To examine how.pdf
 
Biological databases
Biological databasesBiological databases
Biological databases
 
20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTS
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 

Dernier

Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 

Dernier (20)

Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 

Formats de données en biologie

  • 1. Formats de données en biologie Pierre Poulain pierre.poulain@univ-paris-diderot.fr 09/2011
  • 2. Menu 1 Rappels 2 Problématique 3 Séquences 4 Structures 5 Quelques précautions 6 Conclusion 7 Références & crédits graphiques PP Université Paris Diderot - Paris 7 2
  • 3. Menu 1 Rappels 2 Problématique 3 Séquences 4 Structures 5 Quelques précautions 6 Conclusion 7 Références & crédits graphiques PP Université Paris Diderot - Paris 7 3
  • 4. Dogme de la biologie ADN ARN protéine transcription traduction PP Université Paris Diderot - Paris 7 4
  • 5. Menu 1 Rappels 2 Problématique 3 Séquences 4 Structures 5 Quelques précautions 6 Conclusion 7 Références & crédits graphiques PP Université Paris Diderot - Paris 7 5
  • 6. Expérimentalement ADN ARN protéine A,T,C,G A,U,C,G V,G,W,C... AAGATGACCGTGTGTCAT TTGATCCTGAACTGTTTG AAAAAATGTTCCGTGACG GACTCTTTGATGATGAGA CCTCGGAAGTAACGGAGC AGCGCAATGTTCCGTGAC CAGCTGACAATGTATCAG ATTCCAGACTGGATCAGA TCTGAATGCCATTAGCTT PP Université Paris Diderot - Paris 7 6
  • 7. TTGTCACCTGTACACTGGCATTACTACACAGAAACCCAGATGTCCGTTACC Séquences > structures AAGATGACCGTGTGTCATTCATTCCTAAGATTCAAAATGATTTCGATGGCA TTGATCCTGAACTGTTTGAATTGAGAAAAGCTGTTATGGACACCAATGAAA AAAAAATGTTCCGTGACGACACTTTCGGCAAGAACCTGAATGCAAACACAA GACTCTTTGATGATGAGACTAGTTCATCCTCTTTTAAGCAAAATTCCTCTC CCTCGGAAGTAACGGAGCAACCTGTGCAACCAACCTCCGCTGTCATGGGTA GCTTCTTGTCTCCACAGTACCAACGTGCGTCATCTGCTTCTCGTACTAATC ATAATACAAGCACCTCCAGTTTAATGAAGCCTGAATCAAGTCTCTACCTGG ATAAATCATATTCGCATTTTAACAACAACGGCAGCAACGAAAACGCCCGCA CATATTTGTAATCCAATATATACTCACATGTAACAACTTATTATATAAATA AAGGATATCCTACATTATATTTCATAGAAAACCGCTCAAAAAGGTGTATTA CATCCCAACACCACACATATTTCAGCGATAAAAACCTTAAATGTGAAATTC CTGCTTCCTTAAATGTACGCAATTGCCGCTTTTTTCTGACATCTTTTTTGA AAGGAAACAGATCCTCCAGAAGGGATTTACTGTTGGCTATTTTGTGTTAGA ATAATAGATTAGGTTGCGTAAGTCATGGTCGAAAATAGTACGCAGAAGGCC GGAAATGATGATAATAGCTCTACCAAGCCATATTCGGAGGCGTTTTTCTTA AACCCAACGCCTGGATTAGAAGCTGAGCACTCAAGCACATCGCCTGCCCCC AACTTGAAAATCGGTATGCTATTATCAATGCTTTACAATTCTGTCGGTTAC GAGGATCATTGCCCTCAAGGTGGCGAATATTCGGATTTATTGAGAAATTTG PP Université Paris Diderot - Paris 7 7 TGTGAAGCTATTTTGCCATCTTACGAAATTATTGAACGCTACAAGAACCAC
  • 8. Séquences > structures PP Université Paris Diderot - Paris 7 8
  • 9. Séquences > structures PP Université Paris Diderot - Paris 7 9
  • 12. Menu 1 Rappels 2 Problématique 3 Séquences 4 Structures 5 Quelques précautions 6 Conclusion 7 Références & crédits graphiques PP Université Paris Diderot - Paris 7 12
  • 13. Séquences nucléiques, protéiques PP Université Paris Diderot - Paris 7 13
  • 14. Format Fasta Le plus simple >gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY PP Université Paris Diderot - Paris 7 14
  • 15. Fasta >en-tête séquence sur 80 caractères maximum par ligne séquence sur 80 caractères maximum par ligne séquence sur 80 caractères maximum par ligne séquence sur 80 caractères maximum par ligne séquence sur 80 carac PP Université Paris Diderot - Paris 7 15
  • 16. Remarques > colle en-tête longueur de chaque ligne fixée extensions .fasta, .seq, .fas, .fna, .faa Python : chaînes de caractères + listes + (biopython) PP Université Paris Diderot - Paris 7 16
  • 17. Multifasta >gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY >gi|134252438|gb|ABO64984.1| cytochrome b [Elephantulus rupestris] TAFSSVTHICRDVNYGWLIRYLHANGASLFFICLFIHVGRGIYYGSYLYFETWNIGVILLFITMATAFMG YVLPWGQMSFWGATVITNLLSAIPYIGTTLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMVHLLFL HETGSNNPLGLVSDSDKIPFHPYYTIKDLLGVFAILILHLSLVLFSPDLLGDPDNYTPANPLNTPPHIKP EWYFLFAYAILRSIPNKLGGVLALVLSILILIIFPLLHTSKQRSLMFRPISQCLFWVLVADLLTLTWIGG QPVEHPYIIIGQLASILYFTIILVLMPIAGVIENHIIKL >gi|157367467|gb|ABV45600.1| cytochrome b [Mammuthus primigenius] MTHIRKSHPLLKIINKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTMTAFSSMSHIC RDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSF WGATVITNLFSAIPYIGTDLVEWIWGGFSVDKATLNRFFALHFILPFTMIALAGVHLTFLHETGSNNPLG LTSDSDKIPFHPYYTIKDFLGLLILILLLLLLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAI LRSVPNKLGGILALLLSILILGMMPLLHTSKHRSMMLRPLSQVLFWTLATDLLMLTWIGSQPVEHPYIII GQMASILYFSIILAFLPIAGMIENYLIK PP Université Paris Diderot - Paris 7 17
  • 18. Bases de données de séquences primaires GenBank – EMBL – DDBJ PP Université Paris Diderot - Paris 7 18
  • 19. GenBank http://www.ncbi.nlm.nih.gov/
  • 22. Exemple LOCUS NM_001001317 940 bp mRNA linear PRI 27-DEC-2010 DEFINITION Homo sapiens trypsin X3 (TRYX3), mRNA. ACCESSION NM_001001317 VERSION NM_001001317.2 GI:170650697 [...] FEATURES Location/Qualifiers source 1..940 /organism="Homo sapiens" /mol_type="mRNA" /db_xref="taxon:9606" /chromosome="7" /map="7q34" gene 1..940 /gene="TRYX3" /gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540" /note="trypsin X3" /db_xref="GeneID:136541" /db_xref="HPRD:15572" [...] ORIGIN 1 aaggctggca aaaaggagac cagacaggag gcgtctgtag agatatcatg aacttcaact 61 tagctttgtt ttccagagac tggagctaaa ctgggctttc aacatcatca tgaagtttat [...] 781 tgccaaaatt ttttactata taccctggat tgaaaatgta atccaaaata actgagctgt 841 ggcagttgtg gaccatatga cacagcttgt ccccatcgtt cacctttaga attaaatata 901 aattaactcc tcaaaaaaaa aaaaaaaaaa aaaaaaaaaa // PP Université Paris Diderot - Paris 7 22
  • 23. Exemple LOCUS NM_001001317 940 bp mRNA linear PRI 27-DEC-2010 DEFINITION ACCESSION VERSION Homo sapiens trypsin X3 (TRYX3), mRNA. NM_001001317 NM_001001317.2 GI:170650697 en-tête [...] FEATURES Location/Qualifiers source 1..940 /organism="Homo sapiens" /mol_type="mRNA" /db_xref="taxon:9606" /chromosome="7" gene /map="7q34" 1..940 features /gene="TRYX3" /gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540" /note="trypsin X3" /db_xref="GeneID:136541" /db_xref="HPRD:15572" [...] ORIGIN 1 aaggctggca aaaaggagac cagacaggag gcgtctgtag agatatcatg aacttcaact [...] 61 tagctttgtt ttccagagac tggagctaaa ctgggctttc aacatcatca tgaagtttat séquence 781 tgccaaaatt ttttactata taccctggat tgaaaatgta atccaaaata actgagctgt 841 ggcagttgtg gaccatatga cacagcttgt ccccatcgtt cacctttaga attaaatata 901 aattaactcc tcaaaaaaaa aaaaaaaaaa aaaaaaaaaa // PP Université Paris Diderot - Paris 7 23
  • 24. En-tête LOCUS NM_001001317 940 bp mRNA linear PRI 27-DEC-2010 | | | | | nom taille type de division date de molécule modification ACCESSION NM_001001317 | numéro d'accession (unique et stable) SOURCE Homo sapiens (human) | nom de l'organisme ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. | taxonomie REFERENCE 1 (bases 1 to 940) AUTHORS Bubb,K.L., Bovee,D., Buckley,D., Haugen,E., Kibukawa,M., Paddock,M., Palmieri,A., Subramanian,S., Zhou,Y., Kaul,R., Green,P. and Olson,M.V. TITLE Scan of human genome reveals no new Loci under ancient balancing selection JOURNAL Genetics 173 (4), 2165-2177 (2006) PUBMED 16751668 | référence bibliographique PP Université Paris Diderot - Paris 7 24
  • 25. Features début et fin du gène | nom du gène gene 1..940 | /gene="TRYX3" /gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540" /note="trypsin X3" /db_xref="GeneID:136541" /db_xref="HPRD:15572" | identifiants d'autres bases de données séquence codante début et fin | | CDS 110..835 /gene="TRYX3" /gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540" /EC_number="3.4.21.4" /note="trypsin-X3" nom de la protéine produite /codon_start=1 | /product="trypsin-X3 precursor" /protein_id="NP_001001317.1" /db_xref="GI:48255915" /db_xref="CCDS:CCDS5871.1" /db_xref="GeneID:136541" /db_xref="HPRD:15572" /translation="MKFILLWALLNLTVALAFNPDYTVSSTPPYLVYLKSDYLPCAGV LIHPLWVITAAHCNLPKLRVILGVTIPADSNEKHLQVIGYEKMIHHPHFSVTSIDHDI MLIKLKTEAELNDYVKLANLPYQTISENTMCSVSTWSYNVCDIYKEPDSLQTVNISVI SKPQCRDAYKTYNITENMLCVGIVPGRRQPCKEVSAAPAICNGMLQGILSFADGCVLR ADVGIYAKIFYYIPWIENVIQNN" | séquence de la protéine PP Université Paris Diderot - Paris 7 25
  • 26. Séquence ORIGIN 1 aaggctggca aaaaggagac cagacaggag gcgtctgtag agatatcatg aacttcaact 61 tagctttgtt ttccagagac tggagctaaa ctgggctttc aacatcatca tgaagtttat 121 cctcctctgg gccctcttga atctgactgt tgctttggcc tttaatccag attacacagt 181 cagctccact cccccttact tggtctattt gaaatctgac tacttgccct gcgctggagt 241 cctgatccac ccgctttggg tgatcacagc tgcacactgc aatttaccaa agcttcgggt 301 gatattgggg gttacaatcc cagcagactc taatgaaaag catctgcaag tgattggcta 361 tgagaagatg attcatcatc cacacttctc agtcacttct attgatcatg acatcatgct 421 aatcaagctg aaaacagagg ctgaactcaa tgactatgtg aaattagcca acctgcccta 481 ccaaactatc tctgaaaata ccatgtgctc tgtctctacc tggagctaca atgtgtgtga 541 tatctacaaa gagcccgatt cactgcaaac tgtgaacatc tctgtaatct ccaagcctca 601 gtgtcgcgat gcctataaaa cctacaacat cacggaaaat atgctgtgtg tgggcattgt 661 gccaggaagg aggcagccct gcaaggaagt ttctgctgcc ccggcaatct gcaatgggat 721 gcttcaagga atcctgtctt ttgcggatgg atgtgttttg agagccgatg ttggcatcta 781 tgccaaaatt ttttactata taccctggat tgaaaatgta atccaaaata actgagctgt 841 ggcagttgtg gaccatatga cacagcttgt ccccatcgtt cacctttaga attaaatata 901 aattaactcc tcaaaaaaaa aaaaaaaaaa aaaaaaaaaa // | séquence du gène PP Université Paris Diderot - Paris 7 26
  • 27. Remarques extension .gbk visualisation : artemis http://www.sanger.ac.uk/resources/software/artemis/ format EMBL (.embl) ∼ .gbk Python : chaînes de caractères/listes + expressions régulières PP Université Paris Diderot - Paris 7 27
  • 28. EMBL ID 7 standard; DNA; HTG; 5916 BP. AC chromosome:GRCh37:7:141951963:141957878:-1 [...] OS Homo sapiens (human) OC Eukaryota; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; OC Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; OC Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; OC Euarchontoglires; Primates; Haplorrhini; Simiiformes; Catarrhini; OC Hominoidea; Hominidae; Homininae; Homo. [...] FT gene 1..5916 FT /gene=ENSG00000171147 FT /locus_tag="U66059.56" FT /note="Trypsin-X3 Precursor (EC 3.4.21.4) [...] FT CDS join(352..391,2386..2524,2748..3004,5448..5587,5689..5838) FT /gene="ENSESTG00000027201" FT /protein_id="ENSESTP00000068598" FT /note="transcript_id=ENSESTT00000068598" FT /translation="MKFILLWALLNLTVALAFNPDYTVSSTPPYLVYLKSDYLPCAGVL FT IHPLWVITAAHCNLPKLRVILGVTIPADSNEKHLQVIGYEKMIHHPHFSVTSIDHDIML [...] SQ Sequence 5916 BP; 1714 A; 1266 C; 1022 G; 1914 T; 0 other; AAGGCTGGCA AAAAGGAGAC CAGACAGGAG GCGTCTGTAG AGATATCATG AACTTCAACT 60 TAGCTTTGGT ACTTTCTTCC CTGAAGACAG AGGGCAGAAC TCTGAGTTCC AGAACCATTT 120 TCAACTGTAT TGGGGACCAA TCACTTGACT CTATTCTTGT CTCTCTGACA GATGACGCTA 180 CACTCTCCTC TGAATAATGG ACACCATTTC TAAAACTGAA TCCTGCTACT AAAATAATTC 240 [...] GTAATCCAAA ATAACTGAGC TGTGGCAGTT GTGGACCATA TGACACAGCT TGTCCCCATC 5880 GTTCACCTTT AGAATTAAAT ATAAATTAAC TCCTCA 5916 // PP Université Paris Diderot - Paris 7 28
  • 29. Bases de données de séquences secondaires UniProt – Pfam – ProSite – ... PP Université Paris Diderot - Paris 7 29
  • 30. UniProt http://www.uniprot.org/
  • 33. Exemple ID TRY3_HUMAN Reviewed; 304 AA. AC P35030; A9Z1Y4; P15951; Q15665; Q5VXV0; Q9UQV3; DT 01-FEB-1994, integrated into UniProtKB/Swiss-Prot. DT 14-OCT-2008, sequence version 2. DT 11-JAN-2011, entry version 111. DE RecName: Full=Trypsin-3; DE EC=3.4.21.4; DE AltName: Full=Brain trypsinogen; DE AltName: Full=Mesotrypsinogen; [...] CC -!- FUNCTION: Digestive protease specialized for the degradation of CC trypsin inhibitors. CC -!- CATALYTIC ACTIVITY: Preferential cleavage: Arg-|-Xaa, Lys-|-Xaa. CC -!- COFACTOR: Binds 1 calcium ion per subunit. [...] DR PIR; S33496; S33496. DR RefSeq; NP_002762.2; NM_002771.3. DR UniGene; Hs.654513; -. DR PDB; 1H4W; X-ray; 1.70 A; A=81-304. [...] FT DISULFID 196 263 FT DISULFID 228 242 FT DISULFID 253 277 [...] SQ SEQUENCE 304 AA; 32529 MW; 4C4303C310B7BFFC CRC64; MCGPDDRCPA RWPGPGRAVK CGKGLAAARP GRVERGGAQR GGAGLELHPL LGGRTWRAAR DADGCEALGT VAVPFDDDDK IVGGYTCEEN SLPYQVSLNS GSHFCGGSLI SEQWVVSAAH CYKTRIQVRL GEHNIKVLEG NEQFINAAKI IRHPKYNRDT LDNDIMLIKL SSPAVINARV STISLPTTPP AAGTECLISG WGNTLSFGAD YPDELKCLDA PVLTQAECKA SYPGKITNSM FCVGFLEGGK DSCQRDSGGP VVCNGQLQGV VSWGHGCAWK NRPGVYTKVY NYVDWIKDTI AANS // PP Université Paris Diderot - Paris 7 33
  • 34. Détails ID TRY3_HUMAN Reviewed; 304 AA. | | | nom origine : Swiss-Prot taille DT 01-FEB-1994, integrated into UniProtKB/Swiss-Prot. DT 14-OCT-2008, sequence version 2. DT 11-JAN-2011, entry version 111. | dates d'entrée dans UniProt, de modification de la séquence, de modification de la fiche DE RecName: Full=Trypsin-3; | nom de la protéine DE AltName: Full=Brain trypsinogen; DE AltName: Full=Mesotrypsinogen; DE AltName: Full=Serine protease 3; DE AltName: Full=Serine protease 4; DE AltName: Full=Trypsin III; | noms alternatifs OS Homo sapiens (Human). | organisme OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; OC Catarrhini; Hominidae; Homo. | taxonomie PP Université Paris Diderot - Paris 7 34
  • 35. Détails (2) RN [1] RP NUCLEOTIDE SEQUENCE [MRNA] (ISOFORMS A AND B), AND VARIANT ALA-188. RC TISSUE=Brain; RX MEDLINE=94123994; PubMed=8294000; DOI=10.1016/0378-1119(93)90460-K; RA Wiegand U., Corbach S., Minn A., Kang J., Mueller-Hill B.; RT "Cloning of the cDNA encoding human brain trypsinogen and RT characterization of its product."; RL Gene 136:167-175(1993). | référence bibliographique CC -!- FUNCTION: Digestive protease specialized for the degradation of CC trypsin inhibitors. CC -!- CATALYTIC ACTIVITY: Preferential cleavage: Arg-|-Xaa, Lys-|-Xaa. CC -!- COFACTOR: Binds 1 calcium ion per subunit. CC -!- SUBCELLULAR LOCATION: Secreted. | annotations (fonction, localisation) DR PIR; S12764; S12764. DR PIR; S33496; S33496. DR RefSeq; NP_002762.2; NM_002771.3. DR UniGene; Hs.654513; -. | identifiants d'autres bases de données PE 1: Evidence at protein level; | degré de confiance de l'existence (expression) de la protéine PP Université Paris Diderot - Paris 7 35
  • 36. Détails (3) FT MOD_RES 211 211 Sulfotyrosine (By similarity). FT DISULFID 87 217 FT DISULFID 105 121 [...] FT STRAND 111 117 FT HELIX 119 121 | annotations de la séquence SQ SEQUENCE 304 AA; 32529 MW; 4C4303C310B7BFFC CRC64; MCGPDDRCPA RWPGPGRAVK CGKGLAAARP GRVERGGAQR GGAGLELHPL LGGRTWRAAR DADGCEALGT VAVPFDDDDK IVGGYTCEEN SLPYQVSLNS GSHFCGGSLI SEQWVVSAAH CYKTRIQVRL GEHNIKVLEG NEQFINAAKI IRHPKYNRDT LDNDIMLIKL SSPAVINARV STISLPTTPP AAGTECLISG WGNTLSFGAD YPDELKCLDA PVLTQAECKA SYPGKITNSM FCVGFLEGGK DSCQRDSGGP VVCNGQLQGV VSWGHGCAWK NRPGVYTKVY NYVDWIKDTI AANS | séquence de la protéine // | fin de la fiche PP Université Paris Diderot - Paris 7 36
  • 37. Remarques extension .txt également .xml Python : chaînes de caractères/listes + expressions régulières (+ module xml) PP Université Paris Diderot - Paris 7 37
  • 38. xml <?xml version='1.0' encoding='UTF-8'?> <uniprot xmlns="http://uniprot.org/uniprot" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" <entry dataset="Swiss-Prot" created="1994-02-01" modified="2011-01-11" version="111"> <accession>P35030</accession> <accession>A9Z1Y4</accession> <accession>P15951</accession> <accession>Q15665</accession> [...] <dbReference type="NCBI Taxonomy" id="9606" key="2"/> <lineage> <taxon>Eukaryota</taxon> <taxon>Metazoa</taxon> <taxon>Chordata</taxon> [...] <feature type="disulfide bond"> <location> <begin position="228"/> <end position="242"/> [...] <feature type="strand"> <location> <begin position="133"/> <end position="137"/> [...] <sequence length="304" mass="32529" checksum="4C4303C310B7BFFC" modified="2008-10-14" version="2" MCGPDDRCPARWPGPGRAVKCGKGLAAARPGRVERGGAQRGGAGLELHPLLGGRTWRAAR DADGCEALGTVAVPFDDDDKIVGGYTCEENSLPYQVSLNSGSHFCGGSLISEQWVVSAAH CYKTRIQVRLGEHNIKVLEGNEQFINAAKIIRHPKYNRDTLDNDIMLIKLSSPAVINARV STISLPTTPPAAGTECLISGWGNTLSFGADYPDELKCLDAPVLTQAECKASYPGKITNSM FCVGFLEGGKDSCQRDSGGPVVCNGQLQGVVSWGHGCAWKNRPGVYTKVYNYVDWIKDTI AANS </sequence> PP Université Paris Diderot - Paris 7 38
  • 39. Menu 1 Rappels 2 Problématique 3 Séquences 4 Structures 5 Quelques précautions 6 Conclusion 7 Références & crédits graphiques PP Université Paris Diderot - Paris 7 39
  • 40. Protein Data Bank (PDB) structures : ADN, ARN, protéines, virus... Rayons-X, RMN, cryo-microscopie électronique PP Université Paris Diderot - Paris 7 40
  • 41. PDB http://www.rcsb.org/pdb/home/home.do
  • 44. Exemple HEADER HYDROLASE (SERINE PROTEINASE) 26-OCT-81 2PTN TITLE ON THE DISORDERED ACTIVATION DOMAIN IN TRYPSINOGEN. TITLE 2 CHEMICAL LABELLING AND LOW-TEMPERATURE CRYSTALLOGRAPHY COMPND MOL_ID: 1; COMPND 2 MOLECULE: TRYPSIN; COMPND 3 CHAIN: A; COMPND 4 EC: 3.4.21.4; COMPND 5 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: BOS TAURUS; SOURCE 3 ORGANISM_COMMON: CATTLE; SOURCE 4 ORGANISM_TAXID: 9913 KEYWDS HYDROLASE (SERINE PROTEINASE) EXPDTA X-RAY DIFFRACTION [...] REMARK 2 RESOLUTION. 1.55 ANGSTROMS. [...] [...] ATOM 273 N ALA A 55 6.294 11.611 25.982 1.00 9.30 N ATOM 274 CA ALA A 55 6.778 12.670 25.099 1.00 9.30 C ATOM 275 C ALA A 55 7.329 13.864 25.883 1.00 9.30 C ATOM 276 O ALA A 55 6.747 14.218 26.934 1.00 9.30 O ATOM 277 CB ALA A 55 5.636 13.154 24.190 1.00 9.30 C ATOM 278 N ALA A 56 8.461 14.383 25.454 1.00 7.97 N ATOM 279 CA ALA A 56 9.069 15.522 26.129 1.00 7.97 C ATOM 280 C ALA A 56 8.143 16.740 26.167 1.00 7.97 C ATOM 281 O ALA A 56 8.162 17.496 27.169 1.00 7.97 O ATOM 282 CB ALA A 56 10.414 15.918 25.506 1.00 7.97 C [...] PP Université Paris Diderot - Paris 7 44
  • 45. PDB en-tête ——————– © coordonnées © PP Université Paris Diderot - Paris 7 45
  • 46. Coordonnées PyMOL Rasmol VMD ... Python PP Université Paris Diderot - Paris 7 46
  • 47. Coordonnées ATOM 601 N LEU A 99 10.007 19.687 17.536 1.00 12.25 N ATOM 602 CA LEU A 99 9.599 18.429 18.188 1.00 12.25 C ATOM 603 C LEU A 99 10.565 17.281 17.914 1.00 12.25 C ATOM 604 O LEU A 99 10.256 16.101 18.215 1.00 12.25 O ATOM 605 CB LEU A 99 8.149 18.040 17.853 1.00 12.25 C ATOM 606 CG LEU A 99 7.125 19.029 18.438 1.00 18.18 C ATOM 607 CD1 LEU A 99 5.695 18.554 18.168 1.00 18.18 C ATOM 608 CD2 LEU A 99 7.323 19.236 19.952 1.00 18.18 C PP Université Paris Diderot - Paris 7 47
  • 48. PP Université Paris Diderot - Paris 7 48
  • 49. Remarques plusieurs chaînes plusieurs structures (RMN) des trous (RX) Python : chaînes de caractères (tranches) + listes PP Université Paris Diderot - Paris 7 49
  • 50. Plusieurs chaînes ATOM 955 CD2 TYR A 117 28.547 16.730 59.818 1.00 34.54 C ATOM 956 CE1 TYR A 117 26.512 14.828 59.696 1.00 34.81 C ATOM 957 CE2 TYR A 117 28.117 16.089 60.985 1.00 35.96 C ATOM 958 CZ TYR A 117 27.100 15.139 60.917 1.00 35.42 C ATOM 959 OH TYR A 117 26.673 14.515 62.069 1.00 37.14 O ATOM 960 OXT TYR A 117 25.735 19.061 58.351 1.00 32.81 O TER 961 TYR A 117 ATOM 962 N ARG B 3 42.047 55.053 18.876 1.00 34.90 N ATOM 963 CA ARG B 3 42.680 56.307 19.383 1.00 35.03 C ATOM 964 C ARG B 3 43.365 56.041 20.722 1.00 33.56 C ATOM 965 O ARG B 3 42.720 55.647 21.691 1.00 33.47 O ATOM 966 CB ARG B 3 41.614 57.395 19.562 1.00 37.48 C ATOM 967 CG ARG B 3 40.638 57.499 18.394 1.00 41.05 C PP Université Paris Diderot - Paris 7 50
  • 51. Plusieurs structures MODEL 1 ATOM 1 N GLY A 1 11.935 -10.938 0.352 1.00 0.00 N ATOM 2 CA GLY A 1 13.344 -10.643 0.600 1.00 0.00 C ATOM 3 C GLY A 1 13.861 -9.576 -0.330 1.00 0.00 C ATOM 4 O GLY A 1 14.929 -9.728 -0.931 1.00 0.00 O [...] ATOM 934 HB2 GLU A 60 9.981 7.744 1.905 1.00 0.00 H ATOM 935 HB3 GLU A 60 10.321 6.103 2.451 1.00 0.00 H ATOM 936 HG2 GLU A 60 12.152 6.972 3.824 1.00 0.00 H ATOM 937 HG3 GLU A 60 11.700 8.597 3.310 1.00 0.00 H TER 938 GLU A 60 ENDMDL MODEL 2 ATOM 1 N GLY A 1 19.334 -6.988 0.864 1.00 0.00 N ATOM 2 CA GLY A 1 18.296 -6.813 1.874 1.00 0.00 C ATOM 3 C GLY A 1 18.000 -5.370 2.142 1.00 0.00 C ATOM 4 O GLY A 1 18.677 -4.724 2.959 1.00 0.00 O [...] ATOM 934 HB2 GLU A 60 11.353 9.615 -0.439 1.00 0.00 H ATOM 935 HB3 GLU A 60 13.095 9.643 -0.204 1.00 0.00 H ATOM 936 HG2 GLU A 60 13.380 10.930 -2.203 1.00 0.00 H ATOM 937 HG3 GLU A 60 11.654 10.817 -2.534 1.00 0.00 H TER 938 GLU A 60 ENDMDL PP Université Paris Diderot - Paris 7 51
  • 52. Des trous [...] ATOM 7568 CB LYS B 72 -59.462-109.221 -72.440 1.00 31.64 C ATOM 7569 CG LYS B 72 -58.524-109.915 -73.424 1.00 31.85 C ATOM 7570 CD LYS B 72 -58.889-109.602 -74.868 1.00 32.02 C ATOM 7571 CE LYS B 72 -58.174-110.533 -75.837 1.00 31.61 C ATOM 7572 NZ LYS B 72 -58.629-110.335 -77.242 1.00 31.27 N ATOM 7573 N GLY B 73 -61.309-106.416 -72.158 1.00 31.85 N ATOM 7574 CA GLY B 73 -62.485-105.832 -71.510 1.00 30.84 C ATOM 7575 C GLY B 73 -63.598-106.848 -71.303 1.00 29.65 C ATOM 7576 O GLY B 73 -64.660-106.750 -71.920 1.00 28.85 O ATOM 7577 N SER B 74 -63.354-107.820 -70.425 1.00 28.53 N ATOM 7578 CA SER B 74 -64.301-108.911 -70.179 1.00 27.75 C ATOM 7579 C SER B 74 -64.180-109.438 -68.754 1.00 26.72 C ATOM 7580 O SER B 74 -65.113-110.041 -68.227 1.00 24.48 O ATOM 7581 CB SER B 74 -64.070-110.058 -71.166 1.00 26.32 C ATOM 7582 OG SER B 74 -64.505-109.716 -72.470 1.00 25.54 O ATOM 7583 N GLN B 79 -62.682-105.888 -62.336 1.00 42.85 N ATOM 7584 CA GLN B 79 -63.246-104.902 -63.248 1.00 42.57 C ATOM 7585 C GLN B 79 -62.146-104.278 -64.103 1.00 42.60 C ATOM 7586 O GLN B 79 -60.992-104.191 -63.681 1.00 42.45 O ATOM 7587 CB GLN B 79 -63.996-103.819 -62.464 1.00 42.46 C ATOM 7588 CG GLN B 79 -64.950-102.964 -63.300 1.00 42.30 C ATOM 7589 CD GLN B 79 -66.093-103.764 -63.905 1.00 42.15 C ATOM 7590 OE1 GLN B 79 -66.388-104.879 -63.472 1.00 42.18 O ATOM 7591 NE2 GLN B 79 -66.743-103.194 -64.911 1.00 41.70 N ATOM 7592 N VAL B 80 -62.514-103.846 -65.305 1.00 42.30 N ATOM 7593 CA VAL B 80 -61.549-103.342 -66.275 1.00 42.03 C ATOM 7594 C VAL B 80 -60.882-102.055 -65.796 1.00 42.42 C ATOM 7595 O VAL B 80 -61.544-101.165 -65.260 1.00 43.09 O [...] PP Université Paris Diderot - Paris 7 52
  • 53. Menu 1 Rappels 2 Problématique 3 Séquences 4 Structures 5 Quelques précautions 6 Conclusion 7 Références & crédits graphiques PP Université Paris Diderot - Paris 7 53
  • 54. Quelques précautions restez prudents / données PP Université Paris Diderot - Paris 7 54
  • 55. GenBank Z71230 LOCUS Z71230 124 bp DNA linear PLN 14-NOV-2006 DEFINITION Nicotiana tabacum chloroplast JLA region, sequence 2. ACCESSION Z71230 VERSION Z71230.1 GI:1279604 KEYWORDS rpl2 gene; transfer RNA-His; trnH gene. SOURCE chloroplast Nicotiana tabacum (common tobacco) ORGANISM Nicotiana tabacum Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; asterids; lamiids; Solanales; Solanaceae; Nicotianoideae; Nicotianeae; Nicotiana. REFERENCE 1 (bases 1 to 124) AUTHORS Goulding,S.E., Olmstead,R.G., Morden,C.W. and Wolfe,K.H. TITLE Ebb and flow of the chloroplast inverted repeat JOURNAL Mol. Gen. Genet. 252 (1-2), 195-206 (1996) PUBMED 8804393 [...] FEATURES Location/Qualifiers source 1..124 /organism="Nicotiana tabacum" /organelle="plastid:chloroplast" /mol_type="genomic DNA" /isolate="Cuban cahibo cigar, gift from President Fidel Castro" /db_xref="taxon:4097" gene <1..11 /gene="rpl2" PP Université Paris Diderot - Paris 7 55
  • 56. GenBank NC_001610 LOCUS NC_001610 17084 bp DNA circular MAM 14-APR-2009 DEFINITION Didelphis virginiana mitochondrion, complete genome. ACCESSION NC_001610 VERSION NC_001610.1 GI:5835037 DBLINK Project: 11806 KEYWORDS . SOURCE mitochondrion Didelphis virginiana (North American opossum) ORGANISM Didelphis virginiana Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Metatheria; Didelphimorphia; Didelphidae; Didelphis. REFERENCE 1 (bases 1 to 17084) AUTHORS Janke,A., Feldmaier-Fuchs,G., Thomas,W.K., von Haeseler,A. and Paabo,S. TITLE The marsupial mitochondrial genome and the evolution of placental mammals JOURNAL Genetics 137 (1), 243-256 (1994) PUBMED 8056314 [...] FEATURES Location/Qualifiers source 1..17084 /organism="Didelphis virginiana" /organelle="mitochondrion" /mol_type="genomic DNA" /isolate="fresh road killed individual" /db_xref="taxon:9267" /tissue_type="liver" /dev_stage="adult" PP Université Paris Diderot - Paris 7 56
  • 57. GenBank 252544 LOCUS 252544 649 bp RNA linear VRL 19-SEP-2002 DEFINITION gene 7 3' end, 5' end, segment 7 [human rotavirus, strain Wa, Genomic RNA, 425 nt 2 segments]. ACCESSION VERSION GI:252544 KEYWORDS . SOURCE Human rotavirus A ORGANISM Human rotavirus A Viruses; dsRNA viruses; Reoviridae; Sedoreovirinae; Rotavirus; Rotavirus A. [...] FEATURES Location/Qualifiers source 1..649 /organism="Human rotavirus A" /mol_type="genomic RNA" /strain="Wa" /db_xref="taxon:10941" ORIGIN 1 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 61 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 121 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 181 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 241 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 301 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 361 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 421 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 481 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 541 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 601 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnn // PP Université Paris Diderot - Paris 7 57
  • 58. PDB 7GBP, chaîne D, res 67 PP Université Paris Diderot - Paris 7 Oups ! 58
  • 59. Menu 1 Rappels 2 Problématique 3 Séquences 4 Structures 5 Quelques précautions 6 Conclusion 7 Références & crédits graphiques PP Université Paris Diderot - Paris 7 59
  • 60. TTGTCACCTGTACACTGGCATTACTACACAGAAACCCAGATGTCCGTTACC AAGATGACCGTGTGTCATTCATTCCTAAGATTCAAAATGATTTCGATGGCA TTGATCCTGAACTGTTTGAATTGAGAAAAGCTGTTATGGACACCAATGAAA données : séquences, structures... AAAAAATGTTCCGTGACGACACTTTCGGCAAGAACCTGAATGCAAACACAA GACTCTTTGATGATGAGACTAGTTCATCCTCTTTTAAGCAAAATTCCTCTC CCTCGGAAGTAACGGAGCAACCTGTGCAACCAACCTCCGCTGTCATGGGTA GCTTCTTGTCTCCACAGTACCAACGTGCGTCATCTGCTTCTCGTACTAATC ATAATACAAGCACCTCCAGTTTAATGAAGCCTGAATCAAGTCTCTACCTGG ATAAATCATATTCGCATTTTAACAACAACGGCAGCAACGAAAACGCCCGCA CATATTTGTAATCCAATATATACTCACATGTAACAACTTATTATATAAATA AAGGATATCCTACATTATATTTCATAGAAAACCGCTCAAAAAGGTGTATTA CATCCCAACACCACACATATTTCAGCGATAAAAACCTTAAATGTGAAATTC CTGCTTCCTTAAATGTACGCAATTGCCGCTTTTTTCTGACATCTTTTTTGA AAGGAAACAGATCCTCCAGAAGGGATTTACTGTTGGCTATTTTGTGTTAGA ATAATAGATTAGGTTGCGTAAGTCATGGTCGAAAATAGTACGCAGAAGGCC GGAAATGATGATAATAGCTCTACCAAGCCATATTCGGAGGCGTTTTTCTTA AACCCAACGCCTGGATTAGAAGCTGAGCACTCAAGCACATCGCCTGCCCCC AACTTGAAAATCGGTATGCTATTATCAATGCTTTACAATTCTGTCGGTTAC GAGGATCATTGCCCTCAAGGTGGCGAATATTCGGATTTATTGAGAAATTTG TGTGAAGCTATTTTGCCATCTTACGAAATTATTGAACGCTACAAGAACCAC
  • 62. il existe des normes ... pas toujours respectées
  • 63. réfléchissez aux objets que vous manipulez
  • 64. PP Université Paris Diderot - Paris 7 64
  • 65. Menu 1 Rappels 2 Problématique 3 Séquences 4 Structures 5 Quelques précautions 6 Conclusion 7 Références & crédits graphiques PP Université Paris Diderot - Paris 7 65
  • 66. Références Cours de J.-C. Gelly Bases de données en biologie Bioinformatics for dummies de J.-M. Claverie et C. Notredame BioStar Incorrect / unusual entries in main databases (GenBank, UniProt, PDB) ? http://biostar.stackexchange.com/questions/10869/ incorrect-unusual-entries-in-main-databases-genbank-uniprot-pdb PP Université Paris Diderot - Paris 7 66
  • 67. Références (2) format FASTA – http://en.wikipedia.org/wiki/FASTA_format GenBank – http://www.ncbi.nlm.nih.gov/ format : http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html UniProt – http://www.uniprot.org/ format : http://www.uniprot.org/manual/ PDB – http://www.rcsb.org/pdb/home/home.do format : http://www.wwpdb.org/documentation/format23/v2.3.html PP Université Paris Diderot - Paris 7 67
  • 68. Crédits graphiques Squidonius (Wikimedia) Ralphbijker (Flickr) USDA/ARS Viktorvoigt (Wikimedia) Icons-Land (Findicons) herzogbr (Flickr) Icons-Land (Findicons) PAPYRARRI (Flickr) PP Université Paris Diderot - Paris 7 68