SlideShare a Scribd company logo
1 of 13
DNA methylation coverage in two
  tissues of the Pacific Oyster
                  Claire Ellis
        Bioinformatics Terminal Product
                    3/14/13
DNA methylation patterns in
         Crassostrea gigas

Epigenetics describes DNA
modifications that change gene
expression without altering
nucleotide sequence.
DNA methylation in organisms
is extremely diverse, variable
among species, and can change
genome function under                                            CH3
external influences.
                                         DNA methylation
                                              Source:
                                 http://www.nist.gov/pml/div689/dna
Sequencing Approaches

Bisulfite sequencing was used
                                   Bisulfite sequencing
to examine DNA methylation
                                Cm= methylated cytosine
in gonad tissue
                                C= unmethylated cytosine
MBD-Seq was used to              5’ ACmGTTCGCTTGAG 3’
examine DNA methylation in       3’ TGCmAAGCGAACTC 5’
gill tissue (Mackenzie)
                                            Bisulfite Treatment


                                5’ ACmGTTUGUTTGAG 3’
                                3’ TGCmAAGUGAAUTU 5’
Approach

Bisulfite converted reads aligned to genome and %
methylation value per base calculated by processing
alignments
methylKit is an R package for DNA methylation
analysis and annotation from high-throughput
bisulfite sequencing
methylKit


Goal: obtain methylation coverage and
examine differential methylation between
gonad and gill tissues
 Read annotation files and perform basic statistical
 analyses for differentially methylated regions or
 bases
Methylation Statistics

                                          Histogram of % CpG methylation                                                                                             Histogram of % CpG methylation
                                                                    test                                                                                                                        test
                                                          Gonad                                                    94.7
                                                                                                                                                                                                Gill                                             16.8




                                                                                                                                       40000
            200000




                                                                                                                                                                                                                                             12.5




                                                                                                                                       30000
            150000




                                                                                                                                                                                                                                           9.8
Frequency




                                                                                                                           Frequency
                                                                                                                                                                                                                                     8.1



                                                                                                                                       20000
            100000




                                                                                                                                                                                                                               7.1
                                                                                                                                                                                                                         6.4
                                                                                                                                                                                                                   5.4
                                                                                                                                               4.5
                                                                                                                                                                                                           4 4.2
                                                                                                                                       10000

                                                                                                                                                                                               3.8
            50000




                                                                                                                                                     2.4                             2.6             2.7
                                                                                                                                                               1.8         1.7 1.6         2
                                                                                                                                                                     1.4
                                                              4.4                                                                                          1
                     0   0   0        0    0 0.3 0        0          0     0        0 0.3 0   0        0   0   0
            0




                                                                                                                                       0




                                 20                  40                        60                 80                 100                       0                 20                    40                   60                   80                 100

                                                 % methylation per base                                                                                                          % methylation per base
Coverage Statistics

                                            Histogram of CpG coverage                                                           Histogram of CpG coverage
                                                             test                                                                                  test
                      91.6                             Gonad                                           16.9                                       Gill




                                                                                               40000
            200000




                                                                                                                   13.6
                                                                                                                12.7
                                                                                                             12.5




                                                                                               30000
            150000




                                                                                                                      9.4
Frequency




                                                                                   Frequency                             8.5
            100000




                                                                                               20000
                                                                                                                                7.6


                                                                                                                                  5.5
            50000




                                                                                               10000




                                                                                                                                      3.9
                                                                                                                                        2.9
                                                                                                                                              2
                                   7.2                                                                                                            1.5
                                                                                                                                                        1
                                                                                                                                                            0.70.5
                             0 0         0.7 0 0.2 0 0 0 0 0 0 0.1 0 0 0 0 0 0 0                                                                                  0.30.20.10.1 0 0 0
                                                                                                                                                                                     0 0
            0




                                                                                               0




                     0.0                  0.5               1.0             1.5   2.0                  1.0                1.5               2.0                2.5          3.0

                                                log10 of read coverage per base                                                  log10 of read coverage per base
CpG base correlation
                                       CpG base pearson cor.

                                                          0.0      0.2       0.4           0.6   0.8    1.0




                                                                                                                1.0
    ~/Desktop/TJGR_GonadPE_BS_v9_90_CG_methylkit_modified.txt




                                                                                                                0.8
                                                                                                                0.6
                        Gonad                                               0.068  0.068




                                                                                                                0.4
                                                                                                                0.2
    1.0




                                                    ~/Desktop/TJGR_gillMBD_BS_v9_10x_methylkit_modified.tabular.txt
    0.8




                                                                               Gill
    0.6
y

    0.4
    0.2
    0.0




              0.2      0.4       0.6      0.8       1.0
Methylation Clustering
                                         CpG methylation clustering
         0.8
         0.6
Height




                                                                                               Blue= Gonad
         0.4




                                                                                               Red= Gill
         0.2
         0.0




                ethylkit_modified.txt




                                                                      t_modified.tabular.txt




                                       Samples
               Distance method: "correlation"; Clustering method: "ward"
PCA- Principal Component Analysis

                                   CpG methylation PCA Analysis
      2e-12




                           ~/Desktop/TJGR_GonadPE_BS_v9_90_CG_methylkit_modified.txt
      1e-12




                                                                                             Blue= Gonad
      0e+00
PC2




                                                                                             Red= Gill
      -1e-12
      -2e-12




               -60   -40           -20              0              20              40   60

                                                   PC1
Conclusions


Additional analyses included examining type of differential
methylation (hypo and hyper)
  Extracted bases with a q-value <0.01 and % methylation
  difference >25%
The methylKit package was successfully used to
characterize DNA methylation
Differences between gonad and gill methylation profiles
may be due to library prep
Will use R script for future analyses comparing different
samples’ methylation profiles
Methylation Statistics

> getMethylationStats(gonad,plot=F,both.strands=F)
methylation statistics per base
summary (gonad):
  Min. 1st Qu. Median Mean 3rd Qu. Max.
 9.091 100.000 100.000 97.360 100.000 100.000
Percentiles (gonad):
0% 10% 20% 30% 40% 50% 60% 70% 80% 95% 99.5% 99.9% 100%
9.09 100 100 100 100 100 100 100 100 100 100 100 100
summary (gill):
  Min. 1st Qu. Median Mean 3rd Qu. Max.
 0.00 54.55 78.70 69.56 91.89 100.000
Percentiles (gill):
0% 10% 20% 30% 40% 50% 60% 70% 80% 95% 99.5% 99.9% 100%
0 21.4 46.6 61.1 70.9 78.7 84.6 90 93.7 100 100 100 100

More Related Content

Similar to Bioinformatics Final Product claire

Sang Bum Kim – Connected and Sustainable Transportation
Sang Bum Kim – Connected and Sustainable TransportationSang Bum Kim – Connected and Sustainable Transportation
Sang Bum Kim – Connected and Sustainable TransportationShane Mitchell
 
Creative Cities in Latvia (Poznan, 2010)
Creative Cities in Latvia (Poznan, 2010)Creative Cities in Latvia (Poznan, 2010)
Creative Cities in Latvia (Poznan, 2010)Alise Vitola
 
ASMS 2010 poster
ASMS 2010 posterASMS 2010 poster
ASMS 2010 postermadinger
 
Accomplishments 2008
Accomplishments 2008Accomplishments 2008
Accomplishments 2008danielkhom
 
Thesis PPT - Girase J. R.
Thesis PPT - Girase J. R.Thesis PPT - Girase J. R.
Thesis PPT - Girase J. R.gopigirase
 
MapInfo Professional History
MapInfo Professional HistoryMapInfo Professional History
MapInfo Professional Historymashmore
 
Huff Market Trends Report January 2009
Huff Market Trends Report   January 2009Huff Market Trends Report   January 2009
Huff Market Trends Report January 2009Marlene Burkhart
 
Smart metering - the real energy benefits
Smart metering - the real energy benefitsSmart metering - the real energy benefits
Smart metering - the real energy benefitsEric Salviac
 
MapInfo Professional History to 2012
MapInfo Professional History to 2012MapInfo Professional History to 2012
MapInfo Professional History to 2012mashmore
 

Similar to Bioinformatics Final Product claire (10)

Sang Bum Kim – Connected and Sustainable Transportation
Sang Bum Kim – Connected and Sustainable TransportationSang Bum Kim – Connected and Sustainable Transportation
Sang Bum Kim – Connected and Sustainable Transportation
 
Creative Cities in Latvia (Poznan, 2010)
Creative Cities in Latvia (Poznan, 2010)Creative Cities in Latvia (Poznan, 2010)
Creative Cities in Latvia (Poznan, 2010)
 
ASMS 2010 poster
ASMS 2010 posterASMS 2010 poster
ASMS 2010 poster
 
Accomplishments 2008
Accomplishments 2008Accomplishments 2008
Accomplishments 2008
 
Thesis PPT - Girase J. R.
Thesis PPT - Girase J. R.Thesis PPT - Girase J. R.
Thesis PPT - Girase J. R.
 
MapInfo Professional History
MapInfo Professional HistoryMapInfo Professional History
MapInfo Professional History
 
Huff Market Trends Report January 2009
Huff Market Trends Report   January 2009Huff Market Trends Report   January 2009
Huff Market Trends Report January 2009
 
Us windmap 80meters
Us windmap 80metersUs windmap 80meters
Us windmap 80meters
 
Smart metering - the real energy benefits
Smart metering - the real energy benefitsSmart metering - the real energy benefits
Smart metering - the real energy benefits
 
MapInfo Professional History to 2012
MapInfo Professional History to 2012MapInfo Professional History to 2012
MapInfo Professional History to 2012
 

Bioinformatics Final Product claire

  • 1. DNA methylation coverage in two tissues of the Pacific Oyster Claire Ellis Bioinformatics Terminal Product 3/14/13
  • 2. DNA methylation patterns in Crassostrea gigas Epigenetics describes DNA modifications that change gene expression without altering nucleotide sequence. DNA methylation in organisms is extremely diverse, variable among species, and can change genome function under CH3 external influences. DNA methylation Source: http://www.nist.gov/pml/div689/dna
  • 3. Sequencing Approaches Bisulfite sequencing was used Bisulfite sequencing to examine DNA methylation Cm= methylated cytosine in gonad tissue C= unmethylated cytosine MBD-Seq was used to 5’ ACmGTTCGCTTGAG 3’ examine DNA methylation in 3’ TGCmAAGCGAACTC 5’ gill tissue (Mackenzie) Bisulfite Treatment 5’ ACmGTTUGUTTGAG 3’ 3’ TGCmAAGUGAAUTU 5’
  • 4. Approach Bisulfite converted reads aligned to genome and % methylation value per base calculated by processing alignments methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing
  • 5. methylKit Goal: obtain methylation coverage and examine differential methylation between gonad and gill tissues Read annotation files and perform basic statistical analyses for differentially methylated regions or bases
  • 6.
  • 7. Methylation Statistics Histogram of % CpG methylation Histogram of % CpG methylation test test Gonad 94.7 Gill 16.8 40000 200000 12.5 30000 150000 9.8 Frequency Frequency 8.1 20000 100000 7.1 6.4 5.4 4.5 4 4.2 10000 3.8 50000 2.4 2.6 2.7 1.8 1.7 1.6 2 1.4 4.4 1 0 0 0 0 0 0.3 0 0 0 0 0 0.3 0 0 0 0 0 0 0 20 40 60 80 100 0 20 40 60 80 100 % methylation per base % methylation per base
  • 8. Coverage Statistics Histogram of CpG coverage Histogram of CpG coverage test test 91.6 Gonad 16.9 Gill 40000 200000 13.6 12.7 12.5 30000 150000 9.4 Frequency Frequency 8.5 100000 20000 7.6 5.5 50000 10000 3.9 2.9 2 7.2 1.5 1 0.70.5 0 0 0.7 0 0.2 0 0 0 0 0 0 0.1 0 0 0 0 0 0 0 0.30.20.10.1 0 0 0 0 0 0 0 0.0 0.5 1.0 1.5 2.0 1.0 1.5 2.0 2.5 3.0 log10 of read coverage per base log10 of read coverage per base
  • 9. CpG base correlation CpG base pearson cor. 0.0 0.2 0.4 0.6 0.8 1.0 1.0 ~/Desktop/TJGR_GonadPE_BS_v9_90_CG_methylkit_modified.txt 0.8 0.6 Gonad 0.068 0.068 0.4 0.2 1.0 ~/Desktop/TJGR_gillMBD_BS_v9_10x_methylkit_modified.tabular.txt 0.8 Gill 0.6 y 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0
  • 10. Methylation Clustering CpG methylation clustering 0.8 0.6 Height Blue= Gonad 0.4 Red= Gill 0.2 0.0 ethylkit_modified.txt t_modified.tabular.txt Samples Distance method: "correlation"; Clustering method: "ward"
  • 11. PCA- Principal Component Analysis CpG methylation PCA Analysis 2e-12 ~/Desktop/TJGR_GonadPE_BS_v9_90_CG_methylkit_modified.txt 1e-12 Blue= Gonad 0e+00 PC2 Red= Gill -1e-12 -2e-12 -60 -40 -20 0 20 40 60 PC1
  • 12. Conclusions Additional analyses included examining type of differential methylation (hypo and hyper) Extracted bases with a q-value <0.01 and % methylation difference >25% The methylKit package was successfully used to characterize DNA methylation Differences between gonad and gill methylation profiles may be due to library prep Will use R script for future analyses comparing different samples’ methylation profiles
  • 13. Methylation Statistics > getMethylationStats(gonad,plot=F,both.strands=F) methylation statistics per base summary (gonad): Min. 1st Qu. Median Mean 3rd Qu. Max. 9.091 100.000 100.000 97.360 100.000 100.000 Percentiles (gonad): 0% 10% 20% 30% 40% 50% 60% 70% 80% 95% 99.5% 99.9% 100% 9.09 100 100 100 100 100 100 100 100 100 100 100 100 summary (gill): Min. 1st Qu. Median Mean 3rd Qu. Max. 0.00 54.55 78.70 69.56 91.89 100.000 Percentiles (gill): 0% 10% 20% 30% 40% 50% 60% 70% 80% 95% 99.5% 99.9% 100% 0 21.4 46.6 61.1 70.9 78.7 84.6 90 93.7 100 100 100 100

Editor's Notes

  1. The objective of this study is to use C. gigasas a model organism to characterize the distribution and identify potential functions of DNA methylation
  2. Input=% methylation value
  3. % methylation per CytosineGonad- most bases were 100% methylatedGill- variation in % methylation, most were 100% methylated however a small peak at low % methylation**Differences due to library prep (gill enriched for methylation first so much higher % meth)
  4. Read coverage distribution- Histogram of read coverage per cytosine
  5. Pairwise correlation score between the % methylation profiles.Scatterplots of % methylation scores.
  6. PCA of two oyster tissue profiles, shows principal component 1 and 2 for each sample. Samples closer to each other in principal component space are similar in their methylation profiles.
  7. Hypo=under the level or parHyper= over the limit or above normal level