SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
1
Pathway-Based Approach to Analyze Genome-Wide
Association Study of Pancreatic Adenocarcinoma Survival
Using Pre-Defined Gene Sets and Pathway Analysis Software
by Jeanette Wong
Jason A. Willis – Memorial Sloan Kettering Cancer Center
Robert J. Klein, Principal Investigator – Memorial Sloan Kettering Cancer Center
Advisor: Dr. Erin O’Leary Ph.D.
2
Pathway-based Approach to Analyze Genome-Wide Association Study of Pancreatic
Adenocarcinoma Survival Using Pre-Defined Gene Sets and Pathway Analysis Software
Jeanette Wong
Mentor: Jason Willis, Memorial Sloan Kettering Cancer Center
Robert J. Klein, Memorial Sloan Kettering Cancer Center
Advisor: Dr. O’Leary, Bronx High School of Science
Genome wide association studies (GWAS) have identified single-loci markers and SNPs to be
associated with pancreatic cancer; however, complex diseases such as pancreatic cancer develop due
to multiple rare genetic mutations or variations, rather than by a single SNP or gene mutation.
Pathway analyses provide supplemental information from GWAS results to further analyze and
understand disease etiology. With the use of two publically-available pathway analyses software
programs, GSA-SNP and ICSNPathway, standard parameters are set and data is analyzed with the
use of computational algorithms. The goal of this research is to assess results from GWAS of
pancreatic cancer survival data to identify pathways associated to disease progression in addition to
locate genetic mutations that predispose some individuals to pancreatic cancer and influence a
patient’s overall prognosis. Results from this study provide insight to mechanisms of pancreatic
cancer and their relationship to candidate pathways derived from pathway analyses. A literature
survey confirms the significance and relevance of candidate pathways to pancreatic cancer.
Pathway-based Approach to Analyze Genome-Wide Association Study of Pancreatic
Adenocarcinoma Survival using Pre-Defined Gene Sets and Pathway Analysis Software
Introduction
Pancreatic adenocarcinoma kills 95.4% of patients diagnosed with the disease within five
years after initial diagnosis. 1
Pancreatic cancer is one of the most fatal of cancers, as symptoms
3
do not become apparent until late stages, resulting in only 10-20% of patients eligible to be
candidates for resection. After resection, the median survival time is approximately 11-20
months, and the 5-year survival rate is approximately 7-25%. Resection is the only treatment that
has the potential to cure pancreatic adenocarcinoma. While treatments such as chemotherapy
may improve survival by 10-15%, they do not have the ability to cure pancreatic cancer. Patients
who are diagnosed at a late stage of pancreatic cancer are usually not eligible for resection.
These patients have a median survival time of 6-11 months after diagnosis. Patients with
metastatic pancreatic cancer have a median survival time of 2-6 months.2
The key survival of
pancreatic adenocarcinoma would be early detection and diagnosis of the disease, when resection
is a possible treatment, with potential for a cure to the cancer.
Approximately 10% of patients with pancreatic cancer have a family history of pancreatic
cancer.3
Familial pancreatic cancer is transmitted through autosomal dominant means with
approximately 17-19% of families with BRCA2 mutations. Pancreatic cancer may also result
from other disease syndromes such as familial atypical multiple mole melanoma syndrome
(FAMM) and Peutz-Jeghers syndromes. Molecular alterations such as Kras (proto-oncogene)
activation, p53 (tumor-suppressor gene) inactivation, SMAD4, and p16 signaling can be found in
approximately 80% of pancreatic adenocarcinoma patients.2
Known germ-line mutations are
responsible for approximately 10-20% of clustering of pancreatic cancer in families with an
inherent history of the disease. 4
Germ-line refers to the DNA that is inherited from parents in
offspring, whereas somatic mutations arise not due to genetic changes of inherited DNA.
Pathway-based approaches examine whether a group of genes in the same functional biological
pathway is associated with a trait of interest for disease. 5, 6-9
Previous studies have hypothesized
4
that disease risk may possibly be triggered and caused by a variety of numerous rare variants,
and pathway analysis leverages more non-obvious genetic factors associated to disease. 10, 11
Previous studies have shown that molecular pathways leading from benign to malignant
pancreatic cancer have a role in metastasis and therefore survival.12
A greater understanding of
the molecular pathogenesis of pancreatic cancer may allow for the development of novel targeted
treatments and identification of early precursor lesions.13
Genome wide association (GWA)
studies have typically focused on the analysis of single markers, which have found an association
between a single-nucleotide polymorphism (SNP) marker and trait of interest. GWAS studies
have an essential goal to search for the genetic mechanisms that drive the disease, in which
germ-line mutations are to be identified to associate to loci that are found to be associated with
disease. Pathway-based approaches have been developed, using biological knowledge on gene
function to generate more power from genome-wide association study (GWAS) result data.
Previous GWAS Pathway analyses have been completed to target other diseases besides
pancreatic cancer such as breast cancer and Alzheimer’s disease. 14, 15
In pathway analysis, ‘pathway’ is defined as a set of related genes, and not necessarily a
physically networked pathway. With the use of prior biological information and knowledge on
overrepresented pathways in GWAS data, pathway classification analysis can help prioritize
pathways that are most likely to be associated with disease. By incorporating gene networks and
pathway classification tools for analysis of GWAS data, molecular pathways can bring single-
locus genome-wide association studies further in depth. There are currently several available
pathway classification analysis tools and databases; these tools have genes sorted into pre-
defined pathways of cellular processes based on biological genomic and molecular information.
Parts of genomes are inherited together, and every SNP gives information about several other
5
genetic variations on a specific chromosome. Considering the linkage disequilibrium (LD)
patterns within a genome, for pathway analysis, a SNP is mapped back to an LD gene block,
which contains several genes within a specified parameter. In pathway analysis, a threshold p-
value is selected in order to prioritize output. 5
Larger pathways containing a significantly higher
number of genes within gene sets will lead to larger numbers of genotyped SNPs are expected to
show more associated SNPs by chance alone. 11
Pathway-based approaches assess whether test statistics for a group of related genes has
consistent yet moderate deviation from chance. Genes are not fully functional in isolation.
Complex molecular pathways tend to be more related to disease susceptibility and disease
progression. In pathway based association tests for GWAS, a database list of predefined gene
sets for pathways have been created based on prior biological knowledge. The significance of
each pathway can be summarized based on association of markers in or near genes that are
components of a specific pathway. There may be multiple related genes in the same functional
pathway that confer disease progression and pathogenesis. Pathway analysis is complementary to
the conventional GWAS by identifying additional susceptibility genes; pathway analysis can be
used to understand missing heritability in genome-wide association studies.16
It has been presented as a problem that by testing one single gene marker at a time,
coherent patterns cannot be found among significant genes, making biological interpretation
difficult in GWAS. Gene set analysis (GSA) methods use different pre-defined gene sets that are
grouped together based on their biological function and expressions. GSA determines the
significance of pre-defined sets of genes with respect to an outcome variable. In this study, the
outcome variable is the quantitative biological analysis of disease survival. Gene sets have the
ability to coordinate expression patterns of genes of interest. The essential goal for genome-wide
6
association studies is to prioritize the biological functions or related biological networks based
on a targeted biological interest trait or area. Pre-defined gene sets or pathways can further better
define the results from a GWAS. 17
The goal of research is to assess the significance of pathways
from germ-line mutation studies to define and identify significant pathways associated to
pancreatic cancer. Results provide insight into mechanisms of pancreatic cancer and their
relationship to candidate pathways derived from pathway-based analyses. It is hypothesized that
pathway analysis based on results from genome-wide association studies, will be a reliable
indicator of candidate pathways associated to the development and metastasis of pancreatic
cancer.
Materials and Methods
Input for Pathway Analysis - Pancreatic Cancer Survival GWAS
A genome-wide association study (GWAS) was conducted prior to pathway analysis.
DNA samples from 252 patients diagnosed with pancreatic adenocarcinoma were collected via
blood or cheek cell samples. The 252 patients were enrolled in a study at a research institution
and consented to offer a DNA sample. The DNA was genotyped using an Illumina CNV370-duo
SNP genotyping array (~340k SNP markers). After DNA samples of patients were collected,
clinical information of each patient was tracked (e.g. survival time, treatment plan). Results of
the GWAS are p-values assigned to SNPs, without individual-level genotypes.
Input for Pathway Analysis – Example Dataset: Height
GSA-SNP provided an example dataset of 100 samples of DNA in the format of SNPs
and p-values from a Korean population for height (PGWAS < 4x10-6
). SNPs were obtained by
computing labels of SNP microarray data from the Korean Association Resources (KARE)
project, and then with the use of PLINK software, genotyping was completed. The genotypes of
7
a total of 2,168,896 SNPs were imputed using PLINK and 799,492 of them passing PHeight >
1x10-6.
The p-values for all resulting SNPs were gathered and used as an input variable. 17
Standard Thresholds and Parameters for Pathway Analyses
A p-value of the input data is a probability statement that tests the null hypothesis. For
example, as a p-value is smaller, the evidence against a null hypothesis is stronger. The p-value
is compared to a significance value. For pathway analyses of this study, the standard cutoff point
is 0.001, 1x10-3
and any p-value below this threshold is determined to be statistically significant.
When SNPs are being mapped to genes, a SNP would be located between a 5’ and 3’ ends of the
first and last exons of a gene, as it is assigned to a latter. A SNP located within ±20kb of the 5’
and 3’ ends of the first and last exons of a gene is always assigned to a latter (±20kb upstream or
downstream of the gene), in order to take account of surrounding regulatory regions/linkage
disequilibrium (LD) neighborhoods. (Linkage disequilibrium occurs between disease allele and
marker alleles; GWAS can identify disease-associated alleles when mapped from significant
SNPS). 10
If a given SNP was assigned to more than one gene, the SNPs are subject to being
reanalyzed. The Gene Ontology gene set database is used to provide a broad spectrum of gene
sets for genomics research testing enrichment. 1
The standard 10-200 (minimum-maximum) gene
set size of each pathway/gene set was selected to avoid overly narrow or overly broad functional
categories in the Gene Ontology database. The q-value on GSA-SNP represents the False
Discovery Rate (FDR) for the analysis as a correction method to correct false positive results.
The standard FDR cutoff for pathway-based analysis was set at ≤0.05.
GSA-SNP: Gene Set Analysis with SNP Input
Gene set analysis (GSA) has been introduced to genome- wide association studies with
goals to identify association between groups of genes that share a common biological function
8
and disease. With the use of GSA, the power of GWAS can be increased substantially, as
association patterns may be found of gene sets. Data input windows are shown in Figure 1, 2, 3,
and 4, which are the respective steps taken to properly input formatted data. GSA-SNP is
computational software that is freely available along with an example dataset at
http://gsa.muldas.org 17
The input format for GSA-SNP used was a list of p-values for each SNP from a GWAS.
A gene-set analysis works by first taking the “–log” on every individual p-value of a SNP. A
feature of GSA-SNP is the use of a “k-th best p-value” when k= 1, 2, 3, 4, 5 for every individual
gene, allowing gene scores to be more evenly distributed. For this experiment, k=2, the second
best SNP in each gene, was set as a standard to summarize values of multiple SNPs. If k=1 were
set as a standard, significance would only be found in only the best SNP. SNPs are mapped to its
nearest gene within 20 kB. Larger k-values tend to lower the power of results. 17
Procedures for using GSA-SNP
1. Run GSA-SNP: Execute run.sh (Unix/Linux) on a computer. (Figure 1)
2. Breakdown of pathway analysis using GSA-SNP program
- Click the “…” button and choose a data file.
- Click the “Upload” button to detect the data type (SNP, Gene, or Haplotype).
- The program will show relevant input options.
A SNP input data-file is the GSA-SNP input. The program automatically detects the data type by
reading the first ten lines of the input file. The row identifier for SNP data is rs#####. The first
column of the input file is the rs number of a SNP, and the second column of the input file is the
p-value for the SNP. Figure 1 shows the initial window after the GSA-SNP java file is executed.
Figure 2 shows a pop-up window after “open” button is clicked. Figure 3 shows how parameters
9
for analysis are set. In this particular experiment, parameters that were set as standards were
inputted. Figure 4 shows the window when all data and parameters are completely and properly
entered into the analysis program, and the pathway analysis is ready to be run.
Figure 1. Figure 2.
Click the “…” button to manually select the input data file. Select a file. Click the “open” button, then “upload”.
Data Parameters: GSA-SNP applies “–log” to every p-value
in the input data. For SNP data, padding is for mapping SNPs
to genes with due to LD. ±20kB is the set standard threshold.
Figure 3. Gene set parameters of the Gene
Oncology (GO) database. Gene set size is set to
range from 10 (minimum)-200 (maximum) genes
in a gene set to avoid overly narrow or broad
functional gene sets. The q-value is the false
discovery rate (FDR), and is set to default at ≤
0.05.
Figure 4. The analysis begins promptly when
“Run” is clicked. The progression status of the
analysis is found in bottom bar of the program
window. When analysis is complete, results will
appear on the right of the program window.
Within the GSA-SNP software program, the Z-
statistic method is employed to provide a
corrected p-value. In the output variable, the “z-
score” represents results from this algorithm.
3. Results: When the analysis computation is complete, the result appears on the right side of
the executable window. Results are formatted into columns and rows. The results are
10
formatted by: gene set name, gene count in each gene set, gene set size, z-score, corrected p-
value (q-value), and names of genes within each gene set.
Figure
5. The
computati
on results
of GSA-
SNP of
pancreati
c cancer
survival
data with
the
applicatio
n of the
Z-statistic
method
and all
standardi
zed
parameter
s. Results
are
ordered in
decreasin
g
significan
ce of
pathways
based on
p-value of
gene sets.
ICSNPathway: Identification of Candidate Causal SNPs and Pathways
ICSNPathway is an online web server freely available for use, developed to analyze SNPs
from GWAS and identify associated pathways with a targeted interest. ICSNPathway has a
unique approach to deal with linkage disequilibrium (LD) analysis, which is to apply the
HapMap population to more accurately map SNPs to genes for pathway analysis. Figure 6 shows
the online web page of the ICSNPathway program, displaying all the parameters set for the
pathway analysis. Figure 6 is not the initial web page display, but resembles the input page
relatively similarly, as shown by Figure 7. To show what happens within the ICSNPathway
11
program itself, Figure 8 shows how the data is analyzed and how the chosen parameters are
applied. Results of the pathway analysis are able to be downloaded into a text file, as displayed
by Figure 9. Output data is ordered by lowest to highest p-values. ICSNPathway carries out
efficient running procedures within minutes with properly prepared input data and parameters.18
Figure 6. Parameters set for the
KARE Height Data Input.
Standardized set parameters are
applied to analysis.
Figure 7. The home page of the
ICSNPathway web server
program. All input information is
properly completed before
analysis begins when “RUN”
promptly begins the process.
GWAS SNP p-value file is
uploaded, LD neighborhood
parameters are selected, and
standardized parameters selected
for this experiment are all
applied.
12
Figure 8. Diagram of how
ICSNPathway functions
overall.
18
Figure 9. After the
ICSNPathway analysis is
completed, the output is listed on
the result page online. The
output is also available to be
downloaded as a text file. The
results are categorized in
columns: Index (ranking),
Candidate causal pathway, Gene
set URL, Description of Gene
Set, Nominal P-value, and FDR.
Results
Output data from the GSA-SNP software is in the format of a spreadsheet, in which there are
columns and rows, so that data can be sorted in various different ways (i.e. descending,
ascending) of p-value, z-score, etc. Table 1 compares the output values of the two pathway
analysis tools used for comparison purposes geared towards gaining an understanding and
assessment of stability of pathway analyses with usage of different tools. Figure 10 and figure 11
are graphs that show how skewed the results from both the pancreatic cancer programs are, and
how different the output values are, or how similar the values are.
Results of Two Different Pathway Tools Analyzing the Same GWAS Data Input
13
All the data represented in the results of this study are a part of a broader genetics study to
analyze the effect of germ-line pathways that trigger or have association for an inherited trait or
for the development of disease. 5
Table 1.
Comparison of results of
pathway analysis using
two different software
GSA-SNP and
ICSNPathway, similar
pathway names,
rankings, and their p-
values are organized in
the table.
Figure 10. Rankings comparison of overlapping
pathways appearing in the results of both GSA-SNP
and ICSNPathway for the control KARE Height
dataset. This shows that there is no consensus of
rankings, even though the same parameters were set.
Figure 11. Pathway p-value comparison of
overlapping pathways appearing in the results of
both GSA-SNP and ICSNPathway for the control
KARE Height dataset, demonstrating how
application of different algorithms yield different
computation results.
Comparison of GSA-SNP and ICSNPathway
The results from ICSNPathway are vaguer than those of GSA-SNP. With the use of the
same standardized parameters, similar results may have been expected, but there is very minimal
overlapping representation. GSA-SNP has a more broken down Gene Ontology database, in
14
which certain pathways are classified into greater detail, carrying different p-values. This may
have skewed the comparison of the two software programs used. Regardless, there is some
consensus for top pathways from both pathway analysis output results. Table 4 shows the output
values of the GSA-SNP pathway analysis, ordered in ascending p-values. Table 5 shows the top
ranked pathways and its statistical values as computed by the GSA-SNP pathway analysis
program. Figure 10 shows the rankings of overlapping pathways appearing in the results of both
GSA-SNP and ICSNPathway. Figure 11 shows the p-value comparison of overlapping pathways
appearing in the results for both pathway analysis programs, demonstrating different values.
Table 4. List of the highest ranked Gene Ontology categories for SNP association with GWAS Pancreatic Cancer
Survival Data, P-values ≤ 0.001 from GSA-SNP pathway analysis. 253 pathways appeared in the results of the
GSA-SNP pathway analysis of the pancreatic cancer survival GWAS results, and only the most highly significant
pathways were selected for a literature survey in search for relevance to pancreatic cancer.
15
Table 5. After selecting top ranked pathways for pancreatic cancer from the GSA-SNP pathway analysis, a
literature survey was completed. The literature survey was done by using search engines to search for literature
containing terms such as: pancreatic cancer, metastasis, survival, progression, and the name of a pathway. This table
gives citations of one example of published literature that was found from the literature search, as evidence to
support the association of pathway and pancreatic cancer survival factors (metastasis, tumor growth, cancer
progression, cell regulation, cellular invasion, etc.) The top ten most strongly associated pathways are presented.
Discussion
Interpretation of Results
The results of this study successfully address candidate pathways associated with pancreatic
cancer survival, metastasis, carcinogenesis, and underlying biological-genetic mechanisms.
Results of the analysis do not necessarily identify the most highly associated pathways
accurately, as rankings of pathways do not correlate to targeted disease pathogenesis. This study
provides supplementary information to other findings within the same research discipline, in
which it has been said that somatic mutations are predominantly responsible for the
development, risk, and metastasis of pancreatic cancer.
Analysis in Context
Do pathway analyses effectively further the findings from genome-wide association studies?
16
Pathway analyses effectively further the findings from genome-wide association studies to an
extent. Due to the fact that there are numerous differences in output rank between the two
programs and the same input data and parameters were used, an ambiguity is presented. In
addition, the false-discovery-rates show that there may be false positives in pathway analyses,
showing that the reliability of the output values from the analyses may not be accurate or
biologically correct.
How accountable are the quantitative results from the two pathway-based analysis software
used? Since certain output gene set names were different, but the gene sets may contain the same
genes, but not all genes within the gene set, a problem is presented. Further biological research is
necessary to prove whether or not certain genes belong to a certain gene set.
Are pathway analyses an efficient and significant means of leveraging GWAS of other
diseases besides pancreatic cancer? The goal of this study was to allow data to be analyzed with
as minimal bias as possible to the standardized thresholds so that data is most significantly
represented in analysis. Since it has been claimed that complex diseases such as cancer are
driven by multiple rare pathways/genetic mutations, it is ideal to use pathway analyses tools as a
possible solution to problems of GWAS, which identifies germ-lines and SNPs, but not involved
and underlying pathways.
Differences of How a Pathway is Defined
One possible explanation for differences in outcome between GSA-SNP and ICSNPathway
analysis tools when analyzing the same dataset could be the difference in updated Gene
Ontology databases, which include but are not limited to, difference in gene sets, pathways,
genes within each gene set, and use of different statistical algorithms that prioritize outcome
First, the server searches for SNPs in linkage disequilibrium with the most significant SNPs
17
based on the linkage disequilibrium of the specific European American Population (CEU)
HapMap population. By doing so, the genetics of human biology are better assessed. Second,
ICSNP annotates functions to SNPs in order to extract corresponding pathways and genes to
marked functional SNPS. Afterwards, pathway based analysis on GWAS SNP p-values was
performed using the Gene Ontology database to identify candidate pathways and SNPs that may
correspond to a biological trait of interest such as disease. 19
It is difficult to accurately compare
results from various pathway analysis tools, as there are different definitions as to what exactly a
pathway is, and what a pathway contains. Since pathways have networks they interact with to
carry out biological functions, one pathway may not be enough to contribute to disease etiology.
Table 6 offers a detailed list of the disadvantages and limitations of pathway analysis methods,
and how these methods can be improved so that usage of pathway analysis for future analyses
can offer more optimal results. Although well-defined pathways have yet to be established and
there needs more consensus as to what a pathway is defined as, pathway analyses have ability to
make credible computational predictions of how biological processes (e.g. cancer metastasis) are
associated by cellular and molecular pathways. 5
Pathway-based association approaches may be susceptible to false positive results but could
be appropriately replicated with independent data sets. Pathway analyses can be relatively
flexible as it can be conducted on GWAS data from different genotyping platforms. Pathway
based approaches are a possible solution to identifying novel genes or gene sets that confer with
disease pathogenesis.
Table 6.Limitations of Pathway-Based Analysis Approaches
Problem Description Explanation
Outcome
differences
between
GSA-SNP
and
ICSNPathway
• Different gene updates of gene builds: certain
genes may not be recognized
• Different human reference gene sets: there may be
a quantitative difference of the amount of genes in
reference sets
• Freely available pathway analysis
tools may not be up-to-date and
variations between different tools
may exist and affect outcome
• Reference gene sets should be easily
18
• Different statistical algorithms in tools: Z-Statistic,
GSA Restandardization, HapMap
• Different understanding of pathways: names of
pathways and gene sets, the amount of genes in
each gene sets, and more minute classification of
pathways
accessed
• There needs to be more consensus
on pathway classification, and
greater understanding of how
behind biological processes, there
are pathways that have networks
and interact.
Over-
represented
Pathways
• There may be significantly overrepresented
pathways within a pathway analysis tool, in which
gene sets appear to be importantly classified in
output, but no correlation is found between GWAS
dataset input and pathway outcome results. Larger
pathways tend to give off larger outcomes and
numbers that are compared become larger
• Statistically or programmatically,
there may be flaws which are biased
towards specific pathways within
their database and reference gene
set list
In tumor progression, molecular changes occur and improve specificity without significantly
compromising sensitivity. Successful molecular screening can be defined as the identification of
genetic alterations that occur at a specified point in DNA. Pathway analyses are still relatively
new to understanding genomic studies, as it is a more integrated approach to using multiple data
types together in the same pathway-based tool. Since complex diseases such as cancer can
involve multiple pathways, which include interaction between various affected genes, associated
to disease development, it may be ideal to combine various analysis tools for genome-wide
association studies. An important limitation of pathway-based analysis of GWAS is the
incomplete annotation of the human genome. As of now, functionality of many human genes is
unknown, which does not allow genes to be classified into pathways. Overall, there is no
specifically defined standard as to what a pathway is, and as a result, different software use
different databases will offer different results of analysis. Another limitation of this study would
be the lack of validation of results with the control data set. Improvements in organization and
consensus of gene-set pathway databases may greater improve understanding of cellular
mechanisms of genes, pathways, and disease association. 14
Conclusions and Future Work
19
There is evidence that candidate pathways from the GSA-SNP pathway analysis of
pancreatic cancer survival GWAS results are associated with pancreatic cancer. Reliability of
results from pathway analysis was assessed through a comprehensive literature survey with Gene
Ontology terms of gene sets and pancreatic cancer as a key term. Each individual candidate
pathway was validated with multiple published papers to support association between pathway
and disease. The example dataset involving the KARE Height GWAS data served as a control
dataset for this study to establish a case-control experiment. However, there are no established
ways to determine significance of association between pathway, gene sets, and its relationship to
a specific disease or biological trait.
With the use of pathway analysis tools to further analyze genome wide association study
results, overrepresented cell processes can be significantly classified. Pathway analysis is a
potential solution to gaining a greater understanding and value from GWAS, and can prove to be
useful for acquiring a greater insight for disease etiology, risk, diagnosis, and survival time.5
Prior studies have suggested that GWAS studies are insufficient for powerfully detecting small
main effects (overrepresentations) of genes, and gene-gene interactions may have a significant
role in disease pathogenesis, in which GWAS do not assess the full potential of associating genes
in pathways to disease. The application of pathway analyses following GWAS can be considered
as a novel approach to the traditional genome-wide association study methods. 20
GWAS studies aim to find associations between disease phenotypes and genetic alterations.
Pathway analyses offer a simple, ideal alternative that is supplementary to the traditional genetic
association studies. As a result, pathway analyses may offer identification of relevant gene sets
and subsets in pancreatic cancer phenotypes. The use of pathway-based analysis for this study
proved to be useful in examining effects of a pathway or group of genes on disease, through the
20
testing of established gene sets of the Gene Ontology database. Results from this study offer
insight to how pathway analysis methods could potentially increase the power of GWAS results
to detect underlying associated pathways. 21
By studying the results of a GWAS for pancreatic cancer survival from a population of
unrelated patients, it can be determined that patients with pancreatic cancer share a phenotype of
hereditary cancer. Pathway-based approaches allow more biological information from GWAS,
making results from GWAS more powerful for gaining insight for disease pathogenesis. Greater
development of sequencing and analysis tools will further improve the power of pathway
analysis and genetic, genome-wide association studies.
Future steps for pathway analysis would be to have improvements in computational
predictions of cellular processes from genomic and molecular biology, as presented by Table 6.
Research must continue into the molecular components involved in understanding the biology of
pancreatic cancer. The development of more sensitive and specific molecular solutions to
understanding disease is essential to gaining knowledge of the pancreatic cancer progression
model. 22
With the use of GWAS and pathway analysis, insight is gained to the genetics behind
understanding individual genetics of cancer to improve early prevention and diagnosis. Gaining
knowledge of underlying biological, molecular, and functional pathways can allow novel gene
targeted therapies to be designed and developed. GWAS can identify associations among
common alterations within a genome with high-density SNP markers. 10
A preliminary study has been done to further focus on the genes within the gene sets
identified by the pancreatic cancer survival data, to see which genes have the greatest statistical
and quantitative correlation to the biology of pancreatic cancer and its related diseases (i.e.
diabetes, pancreatic inflammation, etc.). This study is an extension of the pathway analysis
21
results, and statistical methods were based on the data provided by the p-values, and amount of
genes within each gene set and/or pathway.
Further experiments of pathway analysis on GWAS can be performed to validate results
from this study and better define conclusions with more reliable, therefore, more significant
results. For example, a larger sample of SNPs can be used as input for a broader library of
disease pathogenesis information, using the same analysis methods. Comparison of association
of somatic mutations vs. germ-line mutations can also offer greater comprehension of disease
mechanisms. Another future improvement could be more consensus of what a pathway is, such
that databases containing p-values, pathways, and gene sets become better defined and prove to
yield more consistent results regardless of differences in software program algorithms. Another
further extension could simply be to manipulate and modify the standardized parameters. As
SNP to gene mapping, and p-value maximum values are changed, pathway analyses results may
offer significantly different results quantitatively and biologically. Questions that can be
answered with further research would be (1) Can pathway analyses further determine and rank
which genes within gene sets are most highly associated with disease incidence? (2) Which
pathways or genes interact with one another in order to trigger disease or advance disease stages?
(3) Are single-loci mutations identifiable and practical enough for understanding disease? Better
insight to pancreatic cancer can be gained if pathway analysis methods and concepts advance. If
a locus and gene is identified, genes can be targeted to further identify therapy methods to treat
and cure cancer.
Development of various computational tools to further analyze biological databases such
as Gene Ontology will allow greater understanding of results from genomic studies. With a
comprehensive examination of relevant published literature, additional experimental validation
22
can confirm results to support computational calculations from algorithms that analyze genomics
and gene profiling to incorporate direct comparison and biological complexities within cancer
survival mechanisms pertaining to specific genes, gene types, and gene interactions. Ultimately,
if proper signaling pathways for pancreatic cancer and other types of cancer, software programs
can be created and developed to provide accurate analysis through algorithmic statistics- gene
networks may also be show abnormality through gene signaling, leading to the possibility for
rational therapeutic gene selection towards finding a potential identifying mechanism for
pancreatic cancer patients.
References
1. Ries LAG, Eisner MP, Kosary CL, Hankey BF, Miller BA, Clegg L, Mariotto A, Feuer EJ,
Edwards BK (eds). SEER Cancer Statistics Review, 1975-2002, National Cancer Institute.
Bethesda, MD, <http://seer.cancer.gov/csr/1975_2002>
2. Thomasset S.C., Lobo D.N. “Pancreatic Cancer”. Hepatobiliary Surgery II (2010) 28:5, 198-
204.
3. Klein, Alison P, et al. “Prospective Risk of Pancreatic Cancer in Familial Pancreatic Cancer
Kindreds.” Cancer Research 64.7 (2004): 2634-8. PubMed.
4. Klein, Alison P, et al. “Prospective Risk of Pancreatic Cancer in Familial Pancreatic Cancer
Kindreds.” Cancer Research 64.7 (2004): 2634-8. PubMed. Web. 6 September 2011.
<http://cancerres.aacrjournals.org/content/64/7/2634.long>.
5. Elbers C.C., Eijk K.R., Frake L., Mulder F., Schouw Y.T., Wijmenga C., Onland-Moret, N.C.
“Using Genome-Wide Pathway Analysis to Unravel the Etiology of Complex Diseases”. Genetic
Epidemiology 33: 419-431 (2009). Doi: 10.1002/gepi.20395
23
6. Visscher, Peter M., WG Hill, and NR Ray. “Heritability in the Genomics Era – Concepts and
Misconceptions.” Nature Reviews. Genetics 9.4 (2008): 255-66. PubMed. Web. 26 Aug. 2010.
<http://www.nature.com/doifinder/10.1038/nrg2322>.
7. Li, J, et al. “A Combined Analysis of Genome-Wide Association Studies in Breast
Cancer.” Breast Cancer Research and Treatment (Sept. 2010): PubMed. Web. 21 Aug. 2011.
<http://www.springerlink.com/content/ kl32p6271h141716/>.
8. Naj, AC, et al. “Dementia Revealed: Novel Chromosome 6 Locus for Late-onset Alzheimer
Disease Provides Genetic Evidence for Folate-pathway Abnormalities.” PLoS Genetics 6.9
(2010): e1001130. PubMed. Web. 15 Aug.2010. <http://www.plosgenetics.org/article/info
%3Adoi%2F10.1371%2Fjournal.pg en.1001130>.
9. Lambert, JC, et al. “Implication of the Immune System in Alzheimer’s Disease: Evidence
from Genome-Wide Pathway Analysis.” Journal of Alzheimer’s Disease: JAD 20.4 (2010):
1107-18. PubMed. Web. 7 September 2011.
<http://iospress.metapress.com/content/mj6t4h073843501l/>.
10. Galvan, Antonella, JP Ioannidis, and TA Dragani. “Beyond Genome-Wide Association
Studies: Genetic Heterogeneity and Individual Predisposition to Cancer.” Trends in Genetics:
TIG 26.3 (2010): 132-41.
11. Cantor, Rita M., K Lange, JS Sinsheimer. “Prioritizing GWAS Results: A Review of
Statistical Methods and Recommendations for Their Application.” American Journal of Human
Genetics 86.1 (2010): 6-22. PubMed. Web. 5 September 2011.
<http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2801749/?tool=pubmed>.
12. Raimondi, S. Lowenfels, A.B., Morselli-Labate A.M., Maisonneuve P., Pezzilli R.
“Pancreatic cancer in chronic pancreatitis” aetiology, incidence, and early detection”. Best
Practice & Research Clinical Gastroenterology 24 (2010) 349-358.
13. Vitone L.J., Greenhalf W., McFaul C.D., Ghaneh P., Neoptolemos J.P. “The inherited
genetics of pancreatic cancer and prospects for secondary screening”. Best Practice & Research
Clinical Gastroenterology. 20 (2006) 253-283.
14. Li J, et al. “ A Combined Analysis of Genome-Wide Association Studies in Breast Cancer.”
Breast Cancer Research and Treatment. (2010) PubMed.
<http://www.springerlink.com/content/kl32p6271h141716/>.
15. Lambert JC, Boley BG, Chouraki V, Heath S, Zelenika D, Fievet N, Hannequin D, Pasquier
F. “Implication of the Immune System in Alzheimer’s Disease: Evidence from Genome-Wide
Pathway Analysis.” Journal of Alzheimer’s Disease 20 (2010) 1107-1118. Doi: 10.323/JAD-
2010-100018
16. Wang, K., Li M., Hakonarson, H. “Analysing biological pathways in genome-wide
association studies”, Nature Reviews, Volume 11, December 2010. Doi: 10.1038/nrg2884
24
17. Nam, D., Kim, J., Kim, S.Y., Kim, S. “GSA-SNP: a general approach for gene set analysis of
polymorphisms”. Nucleic Acids Research 2010, Vol. 38, 749-754, doi:10.1093/nar/gkq428
18. K. Zhang, S. Chang, et al. (2011). "ICSNPathway: identify candidate causal SNPs and
pathways from genome-wide association study by one analytical framework." Nucleic Acids
Res. 39(suppl 2): W437-W443.
19. Zintzaras E, Lau J (2008) Trends in meta analysis of genetic association studies. J Hum
Genet 53, 1-9.
20. Zhao et al. “Pathway-based analysis using reduced gene subsets in genome-wide association
studies”BMC Bioinformatics 2011, 12:17. Doi: 10.1186/1471-2105-12-17
21. Vitone L.J., Greenhalf W., McFaul C.D., Ghaneh P., Neopotolemos J.P. “The inherited
genetics of pancreatic cancer and prospects for secondary screening” Best Practice & Research
Clinical Gastroenterology, Vol. 20. No. 2. 253-283, 2006. Doi: 10.1016/j.bpg.2005.10.007
22. Rhee SY, Wood V, Dolinski K, Draghici S (2008) Use and misuse of the gene ontology
annotations. National Review Genetics 9, 509-515.
23. Torkamani, Ali, EJ Topol, and NJ Schork. “Pathway Analysis of Seven Common Diseases
Assessed by Genome-Wide Association.” Genomics 92.5 (2008): 265- 72. PubMed.
24. Medina, I., Montaner, D., Bonifaci, N., Pujana, M.A., Carbonell, J., Tarraga, J., Al-Shahrour,
F. and Dopazo, J. (2009) Gene set-based analysis of polymorphisms: finding pathways or
biological processes associated to traits in genome-wide association studies. Nucleic Acids Res,
37, W340-344.
25. Wang, K., Li, M. and Hakonarson, H. (2010) Analysing biological pathways in genome-wide
association studies. Nat Rev Genet, 11, 843-854.
26. Zhong, Hua, et al. “Integrating Pathway Analysis and Genetics of Gene Expression for
Genome-wide Association Studies.” American Journal of Human Genetics 86.4 (2010): 581-91.
PubMed. Web. 30 August 2011. <http://www.ncbi.nlm.nih.gov/
pmc/articles/PMC2703874/?tool=pubmed> .
27. “Whole Genome Association Analysis Toolset.” PLINK. Web. 28 August 2011.
<http:///pngu.mgh.harvard.edu/~purcell/plink/gplink.shtml>.
28. You L, Chen G, Zhao Y.P. Core signaling pathways and new therapeutic targets in pancreatic
cancer. Chin Medical Journal 2010; 123 (9): 1210-1215. Doi: 10.3760/cma.j.issn. 0366-6999-
2010.09.020
29. “ICSNPathway”. Web. 28 August 2011. <http://icsnpathway.psych.ac.cn>
25

Contenu connexe

Tendances

A convenient clinical nomogram for small intestine adenocarcinoma
A convenient clinical nomogram for small intestine adenocarcinomaA convenient clinical nomogram for small intestine adenocarcinoma
A convenient clinical nomogram for small intestine adenocarcinoma
nguyên anh doanh
 
2016-Crawford-BMC Pulm Med published
2016-Crawford-BMC Pulm Med published2016-Crawford-BMC Pulm Med published
2016-Crawford-BMC Pulm Med published
Ji-Youn Yeo
 
Validation of candidate biomarker proteins for the early detection of lung ca...
Validation of candidate biomarker proteins for the early detection of lung ca...Validation of candidate biomarker proteins for the early detection of lung ca...
Validation of candidate biomarker proteins for the early detection of lung ca...
sweetflutterbyes
 
Ong et al._Translational utility of next-generation sequencing_2013_Genomics
Ong et al._Translational utility of next-generation sequencing_2013_GenomicsOng et al._Translational utility of next-generation sequencing_2013_Genomics
Ong et al._Translational utility of next-generation sequencing_2013_Genomics
Frank Ong, MD, CPI
 

Tendances (18)

Incidence of pneumonia and risk factors among patients with head and neck can...
Incidence of pneumonia and risk factors among patients with head and neck can...Incidence of pneumonia and risk factors among patients with head and neck can...
Incidence of pneumonia and risk factors among patients with head and neck can...
 
Association between genomic recurrence risk and well-being among breast cance...
Association between genomic recurrence risk and well-being among breast cance...Association between genomic recurrence risk and well-being among breast cance...
Association between genomic recurrence risk and well-being among breast cance...
 
The KRAS-Variant Is Associated with Risk of Developing Double Primary Breast ...
The KRAS-Variant Is Associated with Risk of Developing Double Primary Breast ...The KRAS-Variant Is Associated with Risk of Developing Double Primary Breast ...
The KRAS-Variant Is Associated with Risk of Developing Double Primary Breast ...
 
Differences in microRNA expression during tumor development in the transition...
Differences in microRNA expression during tumor development in the transition...Differences in microRNA expression during tumor development in the transition...
Differences in microRNA expression during tumor development in the transition...
 
Сравнение эффективности препаратов Карбоплатин (Carboplatin) и Паклитаксел (P...
Сравнение эффективности препаратов Карбоплатин (Carboplatin) и Паклитаксел (P...Сравнение эффективности препаратов Карбоплатин (Carboplatin) и Паклитаксел (P...
Сравнение эффективности препаратов Карбоплатин (Carboplatin) и Паклитаксел (P...
 
Clinical Proteomic Tumor Analysis Consortium (CPTAC) Overview
Clinical Proteomic Tumor Analysis Consortium (CPTAC) OverviewClinical Proteomic Tumor Analysis Consortium (CPTAC) Overview
Clinical Proteomic Tumor Analysis Consortium (CPTAC) Overview
 
The KRAS-Variant and miRNA Expression in RTOG Endometrial Cancer Clinical Tri...
The KRAS-Variant and miRNA Expression in RTOG Endometrial Cancer Clinical Tri...The KRAS-Variant and miRNA Expression in RTOG Endometrial Cancer Clinical Tri...
The KRAS-Variant and miRNA Expression in RTOG Endometrial Cancer Clinical Tri...
 
Nrgastro.2012.208
Nrgastro.2012.208Nrgastro.2012.208
Nrgastro.2012.208
 
A convenient clinical nomogram for small intestine adenocarcinoma
A convenient clinical nomogram for small intestine adenocarcinomaA convenient clinical nomogram for small intestine adenocarcinoma
A convenient clinical nomogram for small intestine adenocarcinoma
 
Genomics In Pancreatic Cancer
Genomics In Pancreatic CancerGenomics In Pancreatic Cancer
Genomics In Pancreatic Cancer
 
Translating Cancer Genomes and Transcriptomes for Precision Oncology
Translating Cancer Genomes and Transcriptomes for Precision Oncology Translating Cancer Genomes and Transcriptomes for Precision Oncology
Translating Cancer Genomes and Transcriptomes for Precision Oncology
 
2016-Crawford-BMC Pulm Med published
2016-Crawford-BMC Pulm Med published2016-Crawford-BMC Pulm Med published
2016-Crawford-BMC Pulm Med published
 
ACC Cancer Cell May 2016
ACC Cancer Cell May 2016ACC Cancer Cell May 2016
ACC Cancer Cell May 2016
 
Precision Medicine in Oncology Informatics
Precision Medicine in Oncology InformaticsPrecision Medicine in Oncology Informatics
Precision Medicine in Oncology Informatics
 
Validation of candidate biomarker proteins for the early detection of lung ca...
Validation of candidate biomarker proteins for the early detection of lung ca...Validation of candidate biomarker proteins for the early detection of lung ca...
Validation of candidate biomarker proteins for the early detection of lung ca...
 
Meta analysis on her2 negative locally recurrent and metastatic breast cancer
Meta analysis on her2 negative locally recurrent and metastatic breast cancerMeta analysis on her2 negative locally recurrent and metastatic breast cancer
Meta analysis on her2 negative locally recurrent and metastatic breast cancer
 
Ong et al._Translational utility of next-generation sequencing_2013_Genomics
Ong et al._Translational utility of next-generation sequencing_2013_GenomicsOng et al._Translational utility of next-generation sequencing_2013_Genomics
Ong et al._Translational utility of next-generation sequencing_2013_Genomics
 
Umbrella, Basket and Platform trials
Umbrella, Basket and Platform trialsUmbrella, Basket and Platform trials
Umbrella, Basket and Platform trials
 

En vedette (7)

Gary bader fged_toronto_2012
Gary bader fged_toronto_2012Gary bader fged_toronto_2012
Gary bader fged_toronto_2012
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategy
 
Direct-to-consumer genomics offerings
Direct-to-consumer genomics offeringsDirect-to-consumer genomics offerings
Direct-to-consumer genomics offerings
 
R tools for micro-RNA pathway analysis
R tools for micro-RNA pathway analysisR tools for micro-RNA pathway analysis
R tools for micro-RNA pathway analysis
 
Psb tutorial cancer_pathways
Psb tutorial cancer_pathwaysPsb tutorial cancer_pathways
Psb tutorial cancer_pathways
 
Personalized Medicine: Balancing the Promise and Peril of ... Personalized ...
Personalized Medicine: Balancing the Promise and Peril of ... 	 Personalized ...Personalized Medicine: Balancing the Promise and Peril of ... 	 Personalized ...
Personalized Medicine: Balancing the Promise and Peril of ... Personalized ...
 
Pathway analysis for personalized oncology
Pathway analysis for personalized oncologyPathway analysis for personalized oncology
Pathway analysis for personalized oncology
 

Similaire à MSKCC Publish JSW

Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...
Mutiple Sclerosis
 
biomarcare_journal.pone.0159522.PDF
biomarcare_journal.pone.0159522.PDFbiomarcare_journal.pone.0159522.PDF
biomarcare_journal.pone.0159522.PDF
Ouriel Faktor
 
ASEE-GSW_2015_submission_75
ASEE-GSW_2015_submission_75ASEE-GSW_2015_submission_75
ASEE-GSW_2015_submission_75
Sam Yang
 
A KRAS-variant is a Biomarker of Poor Outcome, Platinum Chemotherapy Resistan...
A KRAS-variant is a Biomarker of Poor Outcome, Platinum Chemotherapy Resistan...A KRAS-variant is a Biomarker of Poor Outcome, Platinum Chemotherapy Resistan...
A KRAS-variant is a Biomarker of Poor Outcome, Platinum Chemotherapy Resistan...
UCLA
 
Clinical Cancer Research Publication
Clinical Cancer Research PublicationClinical Cancer Research Publication
Clinical Cancer Research Publication
David W. Salzman
 

Similaire à MSKCC Publish JSW (20)

5.16.11 biomarkers and genetic tests
5.16.11 biomarkers and genetic tests5.16.11 biomarkers and genetic tests
5.16.11 biomarkers and genetic tests
 
Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...
 
Biotech2012spring 1-overview 0
Biotech2012spring 1-overview 0Biotech2012spring 1-overview 0
Biotech2012spring 1-overview 0
 
Personalized Medicine in Diagnosis and Treatment of Cancer
Personalized Medicine in Diagnosis and Treatment of Cancer Personalized Medicine in Diagnosis and Treatment of Cancer
Personalized Medicine in Diagnosis and Treatment of Cancer
 
A 3′-untranslated region KRAS variant and triple-negative breast cancer: a ca...
A 3′-untranslated region KRAS variant and triple-negative breast cancer: a ca...A 3′-untranslated region KRAS variant and triple-negative breast cancer: a ca...
A 3′-untranslated region KRAS variant and triple-negative breast cancer: a ca...
 
Familial predisposition for colorectal cancers: Who to screen?
Familial predisposition for colorectal cancers: Who to screen?Familial predisposition for colorectal cancers: Who to screen?
Familial predisposition for colorectal cancers: Who to screen?
 
biomarcare_journal.pone.0159522.PDF
biomarcare_journal.pone.0159522.PDFbiomarcare_journal.pone.0159522.PDF
biomarcare_journal.pone.0159522.PDF
 
NGS-report-amir.pdf
NGS-report-amir.pdfNGS-report-amir.pdf
NGS-report-amir.pdf
 
Genomics: Personalised Medicine in Brain Cancer?
Genomics: Personalised Medicine in Brain Cancer?Genomics: Personalised Medicine in Brain Cancer?
Genomics: Personalised Medicine in Brain Cancer?
 
Interrogating differences in expression of targeted gene sets to predict brea...
Interrogating differences in expression of targeted gene sets to predict brea...Interrogating differences in expression of targeted gene sets to predict brea...
Interrogating differences in expression of targeted gene sets to predict brea...
 
Genome wide study
Genome wide studyGenome wide study
Genome wide study
 
Report- Genome wide association studies.
Report- Genome wide association studies.Report- Genome wide association studies.
Report- Genome wide association studies.
 
ASEE-GSW_2015_submission_75
ASEE-GSW_2015_submission_75ASEE-GSW_2015_submission_75
ASEE-GSW_2015_submission_75
 
EAU - Guidelines on Prostate Cancer dr. ali mujtaba
EAU - Guidelines on Prostate Cancer dr. ali mujtabaEAU - Guidelines on Prostate Cancer dr. ali mujtaba
EAU - Guidelines on Prostate Cancer dr. ali mujtaba
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
fonc-04-00078.pdf
fonc-04-00078.pdffonc-04-00078.pdf
fonc-04-00078.pdf
 
Comparative analysis of primary repair vs resection and anastomosis, with lap...
Comparative analysis of primary repair vs resection and anastomosis, with lap...Comparative analysis of primary repair vs resection and anastomosis, with lap...
Comparative analysis of primary repair vs resection and anastomosis, with lap...
 
Recent trends in genomic biomarkers pepgra healthcare
Recent trends in genomic biomarkers   pepgra healthcareRecent trends in genomic biomarkers   pepgra healthcare
Recent trends in genomic biomarkers pepgra healthcare
 
A KRAS-variant is a Biomarker of Poor Outcome, Platinum Chemotherapy Resistan...
A KRAS-variant is a Biomarker of Poor Outcome, Platinum Chemotherapy Resistan...A KRAS-variant is a Biomarker of Poor Outcome, Platinum Chemotherapy Resistan...
A KRAS-variant is a Biomarker of Poor Outcome, Platinum Chemotherapy Resistan...
 
Clinical Cancer Research Publication
Clinical Cancer Research PublicationClinical Cancer Research Publication
Clinical Cancer Research Publication
 

MSKCC Publish JSW

  • 1. 1 Pathway-Based Approach to Analyze Genome-Wide Association Study of Pancreatic Adenocarcinoma Survival Using Pre-Defined Gene Sets and Pathway Analysis Software by Jeanette Wong Jason A. Willis – Memorial Sloan Kettering Cancer Center Robert J. Klein, Principal Investigator – Memorial Sloan Kettering Cancer Center Advisor: Dr. Erin O’Leary Ph.D.
  • 2. 2 Pathway-based Approach to Analyze Genome-Wide Association Study of Pancreatic Adenocarcinoma Survival Using Pre-Defined Gene Sets and Pathway Analysis Software Jeanette Wong Mentor: Jason Willis, Memorial Sloan Kettering Cancer Center Robert J. Klein, Memorial Sloan Kettering Cancer Center Advisor: Dr. O’Leary, Bronx High School of Science Genome wide association studies (GWAS) have identified single-loci markers and SNPs to be associated with pancreatic cancer; however, complex diseases such as pancreatic cancer develop due to multiple rare genetic mutations or variations, rather than by a single SNP or gene mutation. Pathway analyses provide supplemental information from GWAS results to further analyze and understand disease etiology. With the use of two publically-available pathway analyses software programs, GSA-SNP and ICSNPathway, standard parameters are set and data is analyzed with the use of computational algorithms. The goal of this research is to assess results from GWAS of pancreatic cancer survival data to identify pathways associated to disease progression in addition to locate genetic mutations that predispose some individuals to pancreatic cancer and influence a patient’s overall prognosis. Results from this study provide insight to mechanisms of pancreatic cancer and their relationship to candidate pathways derived from pathway analyses. A literature survey confirms the significance and relevance of candidate pathways to pancreatic cancer. Pathway-based Approach to Analyze Genome-Wide Association Study of Pancreatic Adenocarcinoma Survival using Pre-Defined Gene Sets and Pathway Analysis Software Introduction Pancreatic adenocarcinoma kills 95.4% of patients diagnosed with the disease within five years after initial diagnosis. 1 Pancreatic cancer is one of the most fatal of cancers, as symptoms
  • 3. 3 do not become apparent until late stages, resulting in only 10-20% of patients eligible to be candidates for resection. After resection, the median survival time is approximately 11-20 months, and the 5-year survival rate is approximately 7-25%. Resection is the only treatment that has the potential to cure pancreatic adenocarcinoma. While treatments such as chemotherapy may improve survival by 10-15%, they do not have the ability to cure pancreatic cancer. Patients who are diagnosed at a late stage of pancreatic cancer are usually not eligible for resection. These patients have a median survival time of 6-11 months after diagnosis. Patients with metastatic pancreatic cancer have a median survival time of 2-6 months.2 The key survival of pancreatic adenocarcinoma would be early detection and diagnosis of the disease, when resection is a possible treatment, with potential for a cure to the cancer. Approximately 10% of patients with pancreatic cancer have a family history of pancreatic cancer.3 Familial pancreatic cancer is transmitted through autosomal dominant means with approximately 17-19% of families with BRCA2 mutations. Pancreatic cancer may also result from other disease syndromes such as familial atypical multiple mole melanoma syndrome (FAMM) and Peutz-Jeghers syndromes. Molecular alterations such as Kras (proto-oncogene) activation, p53 (tumor-suppressor gene) inactivation, SMAD4, and p16 signaling can be found in approximately 80% of pancreatic adenocarcinoma patients.2 Known germ-line mutations are responsible for approximately 10-20% of clustering of pancreatic cancer in families with an inherent history of the disease. 4 Germ-line refers to the DNA that is inherited from parents in offspring, whereas somatic mutations arise not due to genetic changes of inherited DNA. Pathway-based approaches examine whether a group of genes in the same functional biological pathway is associated with a trait of interest for disease. 5, 6-9 Previous studies have hypothesized
  • 4. 4 that disease risk may possibly be triggered and caused by a variety of numerous rare variants, and pathway analysis leverages more non-obvious genetic factors associated to disease. 10, 11 Previous studies have shown that molecular pathways leading from benign to malignant pancreatic cancer have a role in metastasis and therefore survival.12 A greater understanding of the molecular pathogenesis of pancreatic cancer may allow for the development of novel targeted treatments and identification of early precursor lesions.13 Genome wide association (GWA) studies have typically focused on the analysis of single markers, which have found an association between a single-nucleotide polymorphism (SNP) marker and trait of interest. GWAS studies have an essential goal to search for the genetic mechanisms that drive the disease, in which germ-line mutations are to be identified to associate to loci that are found to be associated with disease. Pathway-based approaches have been developed, using biological knowledge on gene function to generate more power from genome-wide association study (GWAS) result data. Previous GWAS Pathway analyses have been completed to target other diseases besides pancreatic cancer such as breast cancer and Alzheimer’s disease. 14, 15 In pathway analysis, ‘pathway’ is defined as a set of related genes, and not necessarily a physically networked pathway. With the use of prior biological information and knowledge on overrepresented pathways in GWAS data, pathway classification analysis can help prioritize pathways that are most likely to be associated with disease. By incorporating gene networks and pathway classification tools for analysis of GWAS data, molecular pathways can bring single- locus genome-wide association studies further in depth. There are currently several available pathway classification analysis tools and databases; these tools have genes sorted into pre- defined pathways of cellular processes based on biological genomic and molecular information. Parts of genomes are inherited together, and every SNP gives information about several other
  • 5. 5 genetic variations on a specific chromosome. Considering the linkage disequilibrium (LD) patterns within a genome, for pathway analysis, a SNP is mapped back to an LD gene block, which contains several genes within a specified parameter. In pathway analysis, a threshold p- value is selected in order to prioritize output. 5 Larger pathways containing a significantly higher number of genes within gene sets will lead to larger numbers of genotyped SNPs are expected to show more associated SNPs by chance alone. 11 Pathway-based approaches assess whether test statistics for a group of related genes has consistent yet moderate deviation from chance. Genes are not fully functional in isolation. Complex molecular pathways tend to be more related to disease susceptibility and disease progression. In pathway based association tests for GWAS, a database list of predefined gene sets for pathways have been created based on prior biological knowledge. The significance of each pathway can be summarized based on association of markers in or near genes that are components of a specific pathway. There may be multiple related genes in the same functional pathway that confer disease progression and pathogenesis. Pathway analysis is complementary to the conventional GWAS by identifying additional susceptibility genes; pathway analysis can be used to understand missing heritability in genome-wide association studies.16 It has been presented as a problem that by testing one single gene marker at a time, coherent patterns cannot be found among significant genes, making biological interpretation difficult in GWAS. Gene set analysis (GSA) methods use different pre-defined gene sets that are grouped together based on their biological function and expressions. GSA determines the significance of pre-defined sets of genes with respect to an outcome variable. In this study, the outcome variable is the quantitative biological analysis of disease survival. Gene sets have the ability to coordinate expression patterns of genes of interest. The essential goal for genome-wide
  • 6. 6 association studies is to prioritize the biological functions or related biological networks based on a targeted biological interest trait or area. Pre-defined gene sets or pathways can further better define the results from a GWAS. 17 The goal of research is to assess the significance of pathways from germ-line mutation studies to define and identify significant pathways associated to pancreatic cancer. Results provide insight into mechanisms of pancreatic cancer and their relationship to candidate pathways derived from pathway-based analyses. It is hypothesized that pathway analysis based on results from genome-wide association studies, will be a reliable indicator of candidate pathways associated to the development and metastasis of pancreatic cancer. Materials and Methods Input for Pathway Analysis - Pancreatic Cancer Survival GWAS A genome-wide association study (GWAS) was conducted prior to pathway analysis. DNA samples from 252 patients diagnosed with pancreatic adenocarcinoma were collected via blood or cheek cell samples. The 252 patients were enrolled in a study at a research institution and consented to offer a DNA sample. The DNA was genotyped using an Illumina CNV370-duo SNP genotyping array (~340k SNP markers). After DNA samples of patients were collected, clinical information of each patient was tracked (e.g. survival time, treatment plan). Results of the GWAS are p-values assigned to SNPs, without individual-level genotypes. Input for Pathway Analysis – Example Dataset: Height GSA-SNP provided an example dataset of 100 samples of DNA in the format of SNPs and p-values from a Korean population for height (PGWAS < 4x10-6 ). SNPs were obtained by computing labels of SNP microarray data from the Korean Association Resources (KARE) project, and then with the use of PLINK software, genotyping was completed. The genotypes of
  • 7. 7 a total of 2,168,896 SNPs were imputed using PLINK and 799,492 of them passing PHeight > 1x10-6. The p-values for all resulting SNPs were gathered and used as an input variable. 17 Standard Thresholds and Parameters for Pathway Analyses A p-value of the input data is a probability statement that tests the null hypothesis. For example, as a p-value is smaller, the evidence against a null hypothesis is stronger. The p-value is compared to a significance value. For pathway analyses of this study, the standard cutoff point is 0.001, 1x10-3 and any p-value below this threshold is determined to be statistically significant. When SNPs are being mapped to genes, a SNP would be located between a 5’ and 3’ ends of the first and last exons of a gene, as it is assigned to a latter. A SNP located within ±20kb of the 5’ and 3’ ends of the first and last exons of a gene is always assigned to a latter (±20kb upstream or downstream of the gene), in order to take account of surrounding regulatory regions/linkage disequilibrium (LD) neighborhoods. (Linkage disequilibrium occurs between disease allele and marker alleles; GWAS can identify disease-associated alleles when mapped from significant SNPS). 10 If a given SNP was assigned to more than one gene, the SNPs are subject to being reanalyzed. The Gene Ontology gene set database is used to provide a broad spectrum of gene sets for genomics research testing enrichment. 1 The standard 10-200 (minimum-maximum) gene set size of each pathway/gene set was selected to avoid overly narrow or overly broad functional categories in the Gene Ontology database. The q-value on GSA-SNP represents the False Discovery Rate (FDR) for the analysis as a correction method to correct false positive results. The standard FDR cutoff for pathway-based analysis was set at ≤0.05. GSA-SNP: Gene Set Analysis with SNP Input Gene set analysis (GSA) has been introduced to genome- wide association studies with goals to identify association between groups of genes that share a common biological function
  • 8. 8 and disease. With the use of GSA, the power of GWAS can be increased substantially, as association patterns may be found of gene sets. Data input windows are shown in Figure 1, 2, 3, and 4, which are the respective steps taken to properly input formatted data. GSA-SNP is computational software that is freely available along with an example dataset at http://gsa.muldas.org 17 The input format for GSA-SNP used was a list of p-values for each SNP from a GWAS. A gene-set analysis works by first taking the “–log” on every individual p-value of a SNP. A feature of GSA-SNP is the use of a “k-th best p-value” when k= 1, 2, 3, 4, 5 for every individual gene, allowing gene scores to be more evenly distributed. For this experiment, k=2, the second best SNP in each gene, was set as a standard to summarize values of multiple SNPs. If k=1 were set as a standard, significance would only be found in only the best SNP. SNPs are mapped to its nearest gene within 20 kB. Larger k-values tend to lower the power of results. 17 Procedures for using GSA-SNP 1. Run GSA-SNP: Execute run.sh (Unix/Linux) on a computer. (Figure 1) 2. Breakdown of pathway analysis using GSA-SNP program - Click the “…” button and choose a data file. - Click the “Upload” button to detect the data type (SNP, Gene, or Haplotype). - The program will show relevant input options. A SNP input data-file is the GSA-SNP input. The program automatically detects the data type by reading the first ten lines of the input file. The row identifier for SNP data is rs#####. The first column of the input file is the rs number of a SNP, and the second column of the input file is the p-value for the SNP. Figure 1 shows the initial window after the GSA-SNP java file is executed. Figure 2 shows a pop-up window after “open” button is clicked. Figure 3 shows how parameters
  • 9. 9 for analysis are set. In this particular experiment, parameters that were set as standards were inputted. Figure 4 shows the window when all data and parameters are completely and properly entered into the analysis program, and the pathway analysis is ready to be run. Figure 1. Figure 2. Click the “…” button to manually select the input data file. Select a file. Click the “open” button, then “upload”. Data Parameters: GSA-SNP applies “–log” to every p-value in the input data. For SNP data, padding is for mapping SNPs to genes with due to LD. ±20kB is the set standard threshold. Figure 3. Gene set parameters of the Gene Oncology (GO) database. Gene set size is set to range from 10 (minimum)-200 (maximum) genes in a gene set to avoid overly narrow or broad functional gene sets. The q-value is the false discovery rate (FDR), and is set to default at ≤ 0.05. Figure 4. The analysis begins promptly when “Run” is clicked. The progression status of the analysis is found in bottom bar of the program window. When analysis is complete, results will appear on the right of the program window. Within the GSA-SNP software program, the Z- statistic method is employed to provide a corrected p-value. In the output variable, the “z- score” represents results from this algorithm. 3. Results: When the analysis computation is complete, the result appears on the right side of the executable window. Results are formatted into columns and rows. The results are
  • 10. 10 formatted by: gene set name, gene count in each gene set, gene set size, z-score, corrected p- value (q-value), and names of genes within each gene set. Figure 5. The computati on results of GSA- SNP of pancreati c cancer survival data with the applicatio n of the Z-statistic method and all standardi zed parameter s. Results are ordered in decreasin g significan ce of pathways based on p-value of gene sets. ICSNPathway: Identification of Candidate Causal SNPs and Pathways ICSNPathway is an online web server freely available for use, developed to analyze SNPs from GWAS and identify associated pathways with a targeted interest. ICSNPathway has a unique approach to deal with linkage disequilibrium (LD) analysis, which is to apply the HapMap population to more accurately map SNPs to genes for pathway analysis. Figure 6 shows the online web page of the ICSNPathway program, displaying all the parameters set for the pathway analysis. Figure 6 is not the initial web page display, but resembles the input page relatively similarly, as shown by Figure 7. To show what happens within the ICSNPathway
  • 11. 11 program itself, Figure 8 shows how the data is analyzed and how the chosen parameters are applied. Results of the pathway analysis are able to be downloaded into a text file, as displayed by Figure 9. Output data is ordered by lowest to highest p-values. ICSNPathway carries out efficient running procedures within minutes with properly prepared input data and parameters.18 Figure 6. Parameters set for the KARE Height Data Input. Standardized set parameters are applied to analysis. Figure 7. The home page of the ICSNPathway web server program. All input information is properly completed before analysis begins when “RUN” promptly begins the process. GWAS SNP p-value file is uploaded, LD neighborhood parameters are selected, and standardized parameters selected for this experiment are all applied.
  • 12. 12 Figure 8. Diagram of how ICSNPathway functions overall. 18 Figure 9. After the ICSNPathway analysis is completed, the output is listed on the result page online. The output is also available to be downloaded as a text file. The results are categorized in columns: Index (ranking), Candidate causal pathway, Gene set URL, Description of Gene Set, Nominal P-value, and FDR. Results Output data from the GSA-SNP software is in the format of a spreadsheet, in which there are columns and rows, so that data can be sorted in various different ways (i.e. descending, ascending) of p-value, z-score, etc. Table 1 compares the output values of the two pathway analysis tools used for comparison purposes geared towards gaining an understanding and assessment of stability of pathway analyses with usage of different tools. Figure 10 and figure 11 are graphs that show how skewed the results from both the pancreatic cancer programs are, and how different the output values are, or how similar the values are. Results of Two Different Pathway Tools Analyzing the Same GWAS Data Input
  • 13. 13 All the data represented in the results of this study are a part of a broader genetics study to analyze the effect of germ-line pathways that trigger or have association for an inherited trait or for the development of disease. 5 Table 1. Comparison of results of pathway analysis using two different software GSA-SNP and ICSNPathway, similar pathway names, rankings, and their p- values are organized in the table. Figure 10. Rankings comparison of overlapping pathways appearing in the results of both GSA-SNP and ICSNPathway for the control KARE Height dataset. This shows that there is no consensus of rankings, even though the same parameters were set. Figure 11. Pathway p-value comparison of overlapping pathways appearing in the results of both GSA-SNP and ICSNPathway for the control KARE Height dataset, demonstrating how application of different algorithms yield different computation results. Comparison of GSA-SNP and ICSNPathway The results from ICSNPathway are vaguer than those of GSA-SNP. With the use of the same standardized parameters, similar results may have been expected, but there is very minimal overlapping representation. GSA-SNP has a more broken down Gene Ontology database, in
  • 14. 14 which certain pathways are classified into greater detail, carrying different p-values. This may have skewed the comparison of the two software programs used. Regardless, there is some consensus for top pathways from both pathway analysis output results. Table 4 shows the output values of the GSA-SNP pathway analysis, ordered in ascending p-values. Table 5 shows the top ranked pathways and its statistical values as computed by the GSA-SNP pathway analysis program. Figure 10 shows the rankings of overlapping pathways appearing in the results of both GSA-SNP and ICSNPathway. Figure 11 shows the p-value comparison of overlapping pathways appearing in the results for both pathway analysis programs, demonstrating different values. Table 4. List of the highest ranked Gene Ontology categories for SNP association with GWAS Pancreatic Cancer Survival Data, P-values ≤ 0.001 from GSA-SNP pathway analysis. 253 pathways appeared in the results of the GSA-SNP pathway analysis of the pancreatic cancer survival GWAS results, and only the most highly significant pathways were selected for a literature survey in search for relevance to pancreatic cancer.
  • 15. 15 Table 5. After selecting top ranked pathways for pancreatic cancer from the GSA-SNP pathway analysis, a literature survey was completed. The literature survey was done by using search engines to search for literature containing terms such as: pancreatic cancer, metastasis, survival, progression, and the name of a pathway. This table gives citations of one example of published literature that was found from the literature search, as evidence to support the association of pathway and pancreatic cancer survival factors (metastasis, tumor growth, cancer progression, cell regulation, cellular invasion, etc.) The top ten most strongly associated pathways are presented. Discussion Interpretation of Results The results of this study successfully address candidate pathways associated with pancreatic cancer survival, metastasis, carcinogenesis, and underlying biological-genetic mechanisms. Results of the analysis do not necessarily identify the most highly associated pathways accurately, as rankings of pathways do not correlate to targeted disease pathogenesis. This study provides supplementary information to other findings within the same research discipline, in which it has been said that somatic mutations are predominantly responsible for the development, risk, and metastasis of pancreatic cancer. Analysis in Context Do pathway analyses effectively further the findings from genome-wide association studies?
  • 16. 16 Pathway analyses effectively further the findings from genome-wide association studies to an extent. Due to the fact that there are numerous differences in output rank between the two programs and the same input data and parameters were used, an ambiguity is presented. In addition, the false-discovery-rates show that there may be false positives in pathway analyses, showing that the reliability of the output values from the analyses may not be accurate or biologically correct. How accountable are the quantitative results from the two pathway-based analysis software used? Since certain output gene set names were different, but the gene sets may contain the same genes, but not all genes within the gene set, a problem is presented. Further biological research is necessary to prove whether or not certain genes belong to a certain gene set. Are pathway analyses an efficient and significant means of leveraging GWAS of other diseases besides pancreatic cancer? The goal of this study was to allow data to be analyzed with as minimal bias as possible to the standardized thresholds so that data is most significantly represented in analysis. Since it has been claimed that complex diseases such as cancer are driven by multiple rare pathways/genetic mutations, it is ideal to use pathway analyses tools as a possible solution to problems of GWAS, which identifies germ-lines and SNPs, but not involved and underlying pathways. Differences of How a Pathway is Defined One possible explanation for differences in outcome between GSA-SNP and ICSNPathway analysis tools when analyzing the same dataset could be the difference in updated Gene Ontology databases, which include but are not limited to, difference in gene sets, pathways, genes within each gene set, and use of different statistical algorithms that prioritize outcome First, the server searches for SNPs in linkage disequilibrium with the most significant SNPs
  • 17. 17 based on the linkage disequilibrium of the specific European American Population (CEU) HapMap population. By doing so, the genetics of human biology are better assessed. Second, ICSNP annotates functions to SNPs in order to extract corresponding pathways and genes to marked functional SNPS. Afterwards, pathway based analysis on GWAS SNP p-values was performed using the Gene Ontology database to identify candidate pathways and SNPs that may correspond to a biological trait of interest such as disease. 19 It is difficult to accurately compare results from various pathway analysis tools, as there are different definitions as to what exactly a pathway is, and what a pathway contains. Since pathways have networks they interact with to carry out biological functions, one pathway may not be enough to contribute to disease etiology. Table 6 offers a detailed list of the disadvantages and limitations of pathway analysis methods, and how these methods can be improved so that usage of pathway analysis for future analyses can offer more optimal results. Although well-defined pathways have yet to be established and there needs more consensus as to what a pathway is defined as, pathway analyses have ability to make credible computational predictions of how biological processes (e.g. cancer metastasis) are associated by cellular and molecular pathways. 5 Pathway-based association approaches may be susceptible to false positive results but could be appropriately replicated with independent data sets. Pathway analyses can be relatively flexible as it can be conducted on GWAS data from different genotyping platforms. Pathway based approaches are a possible solution to identifying novel genes or gene sets that confer with disease pathogenesis. Table 6.Limitations of Pathway-Based Analysis Approaches Problem Description Explanation Outcome differences between GSA-SNP and ICSNPathway • Different gene updates of gene builds: certain genes may not be recognized • Different human reference gene sets: there may be a quantitative difference of the amount of genes in reference sets • Freely available pathway analysis tools may not be up-to-date and variations between different tools may exist and affect outcome • Reference gene sets should be easily
  • 18. 18 • Different statistical algorithms in tools: Z-Statistic, GSA Restandardization, HapMap • Different understanding of pathways: names of pathways and gene sets, the amount of genes in each gene sets, and more minute classification of pathways accessed • There needs to be more consensus on pathway classification, and greater understanding of how behind biological processes, there are pathways that have networks and interact. Over- represented Pathways • There may be significantly overrepresented pathways within a pathway analysis tool, in which gene sets appear to be importantly classified in output, but no correlation is found between GWAS dataset input and pathway outcome results. Larger pathways tend to give off larger outcomes and numbers that are compared become larger • Statistically or programmatically, there may be flaws which are biased towards specific pathways within their database and reference gene set list In tumor progression, molecular changes occur and improve specificity without significantly compromising sensitivity. Successful molecular screening can be defined as the identification of genetic alterations that occur at a specified point in DNA. Pathway analyses are still relatively new to understanding genomic studies, as it is a more integrated approach to using multiple data types together in the same pathway-based tool. Since complex diseases such as cancer can involve multiple pathways, which include interaction between various affected genes, associated to disease development, it may be ideal to combine various analysis tools for genome-wide association studies. An important limitation of pathway-based analysis of GWAS is the incomplete annotation of the human genome. As of now, functionality of many human genes is unknown, which does not allow genes to be classified into pathways. Overall, there is no specifically defined standard as to what a pathway is, and as a result, different software use different databases will offer different results of analysis. Another limitation of this study would be the lack of validation of results with the control data set. Improvements in organization and consensus of gene-set pathway databases may greater improve understanding of cellular mechanisms of genes, pathways, and disease association. 14 Conclusions and Future Work
  • 19. 19 There is evidence that candidate pathways from the GSA-SNP pathway analysis of pancreatic cancer survival GWAS results are associated with pancreatic cancer. Reliability of results from pathway analysis was assessed through a comprehensive literature survey with Gene Ontology terms of gene sets and pancreatic cancer as a key term. Each individual candidate pathway was validated with multiple published papers to support association between pathway and disease. The example dataset involving the KARE Height GWAS data served as a control dataset for this study to establish a case-control experiment. However, there are no established ways to determine significance of association between pathway, gene sets, and its relationship to a specific disease or biological trait. With the use of pathway analysis tools to further analyze genome wide association study results, overrepresented cell processes can be significantly classified. Pathway analysis is a potential solution to gaining a greater understanding and value from GWAS, and can prove to be useful for acquiring a greater insight for disease etiology, risk, diagnosis, and survival time.5 Prior studies have suggested that GWAS studies are insufficient for powerfully detecting small main effects (overrepresentations) of genes, and gene-gene interactions may have a significant role in disease pathogenesis, in which GWAS do not assess the full potential of associating genes in pathways to disease. The application of pathway analyses following GWAS can be considered as a novel approach to the traditional genome-wide association study methods. 20 GWAS studies aim to find associations between disease phenotypes and genetic alterations. Pathway analyses offer a simple, ideal alternative that is supplementary to the traditional genetic association studies. As a result, pathway analyses may offer identification of relevant gene sets and subsets in pancreatic cancer phenotypes. The use of pathway-based analysis for this study proved to be useful in examining effects of a pathway or group of genes on disease, through the
  • 20. 20 testing of established gene sets of the Gene Ontology database. Results from this study offer insight to how pathway analysis methods could potentially increase the power of GWAS results to detect underlying associated pathways. 21 By studying the results of a GWAS for pancreatic cancer survival from a population of unrelated patients, it can be determined that patients with pancreatic cancer share a phenotype of hereditary cancer. Pathway-based approaches allow more biological information from GWAS, making results from GWAS more powerful for gaining insight for disease pathogenesis. Greater development of sequencing and analysis tools will further improve the power of pathway analysis and genetic, genome-wide association studies. Future steps for pathway analysis would be to have improvements in computational predictions of cellular processes from genomic and molecular biology, as presented by Table 6. Research must continue into the molecular components involved in understanding the biology of pancreatic cancer. The development of more sensitive and specific molecular solutions to understanding disease is essential to gaining knowledge of the pancreatic cancer progression model. 22 With the use of GWAS and pathway analysis, insight is gained to the genetics behind understanding individual genetics of cancer to improve early prevention and diagnosis. Gaining knowledge of underlying biological, molecular, and functional pathways can allow novel gene targeted therapies to be designed and developed. GWAS can identify associations among common alterations within a genome with high-density SNP markers. 10 A preliminary study has been done to further focus on the genes within the gene sets identified by the pancreatic cancer survival data, to see which genes have the greatest statistical and quantitative correlation to the biology of pancreatic cancer and its related diseases (i.e. diabetes, pancreatic inflammation, etc.). This study is an extension of the pathway analysis
  • 21. 21 results, and statistical methods were based on the data provided by the p-values, and amount of genes within each gene set and/or pathway. Further experiments of pathway analysis on GWAS can be performed to validate results from this study and better define conclusions with more reliable, therefore, more significant results. For example, a larger sample of SNPs can be used as input for a broader library of disease pathogenesis information, using the same analysis methods. Comparison of association of somatic mutations vs. germ-line mutations can also offer greater comprehension of disease mechanisms. Another future improvement could be more consensus of what a pathway is, such that databases containing p-values, pathways, and gene sets become better defined and prove to yield more consistent results regardless of differences in software program algorithms. Another further extension could simply be to manipulate and modify the standardized parameters. As SNP to gene mapping, and p-value maximum values are changed, pathway analyses results may offer significantly different results quantitatively and biologically. Questions that can be answered with further research would be (1) Can pathway analyses further determine and rank which genes within gene sets are most highly associated with disease incidence? (2) Which pathways or genes interact with one another in order to trigger disease or advance disease stages? (3) Are single-loci mutations identifiable and practical enough for understanding disease? Better insight to pancreatic cancer can be gained if pathway analysis methods and concepts advance. If a locus and gene is identified, genes can be targeted to further identify therapy methods to treat and cure cancer. Development of various computational tools to further analyze biological databases such as Gene Ontology will allow greater understanding of results from genomic studies. With a comprehensive examination of relevant published literature, additional experimental validation
  • 22. 22 can confirm results to support computational calculations from algorithms that analyze genomics and gene profiling to incorporate direct comparison and biological complexities within cancer survival mechanisms pertaining to specific genes, gene types, and gene interactions. Ultimately, if proper signaling pathways for pancreatic cancer and other types of cancer, software programs can be created and developed to provide accurate analysis through algorithmic statistics- gene networks may also be show abnormality through gene signaling, leading to the possibility for rational therapeutic gene selection towards finding a potential identifying mechanism for pancreatic cancer patients. References 1. Ries LAG, Eisner MP, Kosary CL, Hankey BF, Miller BA, Clegg L, Mariotto A, Feuer EJ, Edwards BK (eds). SEER Cancer Statistics Review, 1975-2002, National Cancer Institute. Bethesda, MD, <http://seer.cancer.gov/csr/1975_2002> 2. Thomasset S.C., Lobo D.N. “Pancreatic Cancer”. Hepatobiliary Surgery II (2010) 28:5, 198- 204. 3. Klein, Alison P, et al. “Prospective Risk of Pancreatic Cancer in Familial Pancreatic Cancer Kindreds.” Cancer Research 64.7 (2004): 2634-8. PubMed. 4. Klein, Alison P, et al. “Prospective Risk of Pancreatic Cancer in Familial Pancreatic Cancer Kindreds.” Cancer Research 64.7 (2004): 2634-8. PubMed. Web. 6 September 2011. <http://cancerres.aacrjournals.org/content/64/7/2634.long>. 5. Elbers C.C., Eijk K.R., Frake L., Mulder F., Schouw Y.T., Wijmenga C., Onland-Moret, N.C. “Using Genome-Wide Pathway Analysis to Unravel the Etiology of Complex Diseases”. Genetic Epidemiology 33: 419-431 (2009). Doi: 10.1002/gepi.20395
  • 23. 23 6. Visscher, Peter M., WG Hill, and NR Ray. “Heritability in the Genomics Era – Concepts and Misconceptions.” Nature Reviews. Genetics 9.4 (2008): 255-66. PubMed. Web. 26 Aug. 2010. <http://www.nature.com/doifinder/10.1038/nrg2322>. 7. Li, J, et al. “A Combined Analysis of Genome-Wide Association Studies in Breast Cancer.” Breast Cancer Research and Treatment (Sept. 2010): PubMed. Web. 21 Aug. 2011. <http://www.springerlink.com/content/ kl32p6271h141716/>. 8. Naj, AC, et al. “Dementia Revealed: Novel Chromosome 6 Locus for Late-onset Alzheimer Disease Provides Genetic Evidence for Folate-pathway Abnormalities.” PLoS Genetics 6.9 (2010): e1001130. PubMed. Web. 15 Aug.2010. <http://www.plosgenetics.org/article/info %3Adoi%2F10.1371%2Fjournal.pg en.1001130>. 9. Lambert, JC, et al. “Implication of the Immune System in Alzheimer’s Disease: Evidence from Genome-Wide Pathway Analysis.” Journal of Alzheimer’s Disease: JAD 20.4 (2010): 1107-18. PubMed. Web. 7 September 2011. <http://iospress.metapress.com/content/mj6t4h073843501l/>. 10. Galvan, Antonella, JP Ioannidis, and TA Dragani. “Beyond Genome-Wide Association Studies: Genetic Heterogeneity and Individual Predisposition to Cancer.” Trends in Genetics: TIG 26.3 (2010): 132-41. 11. Cantor, Rita M., K Lange, JS Sinsheimer. “Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application.” American Journal of Human Genetics 86.1 (2010): 6-22. PubMed. Web. 5 September 2011. <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2801749/?tool=pubmed>. 12. Raimondi, S. Lowenfels, A.B., Morselli-Labate A.M., Maisonneuve P., Pezzilli R. “Pancreatic cancer in chronic pancreatitis” aetiology, incidence, and early detection”. Best Practice & Research Clinical Gastroenterology 24 (2010) 349-358. 13. Vitone L.J., Greenhalf W., McFaul C.D., Ghaneh P., Neoptolemos J.P. “The inherited genetics of pancreatic cancer and prospects for secondary screening”. Best Practice & Research Clinical Gastroenterology. 20 (2006) 253-283. 14. Li J, et al. “ A Combined Analysis of Genome-Wide Association Studies in Breast Cancer.” Breast Cancer Research and Treatment. (2010) PubMed. <http://www.springerlink.com/content/kl32p6271h141716/>. 15. Lambert JC, Boley BG, Chouraki V, Heath S, Zelenika D, Fievet N, Hannequin D, Pasquier F. “Implication of the Immune System in Alzheimer’s Disease: Evidence from Genome-Wide Pathway Analysis.” Journal of Alzheimer’s Disease 20 (2010) 1107-1118. Doi: 10.323/JAD- 2010-100018 16. Wang, K., Li M., Hakonarson, H. “Analysing biological pathways in genome-wide association studies”, Nature Reviews, Volume 11, December 2010. Doi: 10.1038/nrg2884
  • 24. 24 17. Nam, D., Kim, J., Kim, S.Y., Kim, S. “GSA-SNP: a general approach for gene set analysis of polymorphisms”. Nucleic Acids Research 2010, Vol. 38, 749-754, doi:10.1093/nar/gkq428 18. K. Zhang, S. Chang, et al. (2011). "ICSNPathway: identify candidate causal SNPs and pathways from genome-wide association study by one analytical framework." Nucleic Acids Res. 39(suppl 2): W437-W443. 19. Zintzaras E, Lau J (2008) Trends in meta analysis of genetic association studies. J Hum Genet 53, 1-9. 20. Zhao et al. “Pathway-based analysis using reduced gene subsets in genome-wide association studies”BMC Bioinformatics 2011, 12:17. Doi: 10.1186/1471-2105-12-17 21. Vitone L.J., Greenhalf W., McFaul C.D., Ghaneh P., Neopotolemos J.P. “The inherited genetics of pancreatic cancer and prospects for secondary screening” Best Practice & Research Clinical Gastroenterology, Vol. 20. No. 2. 253-283, 2006. Doi: 10.1016/j.bpg.2005.10.007 22. Rhee SY, Wood V, Dolinski K, Draghici S (2008) Use and misuse of the gene ontology annotations. National Review Genetics 9, 509-515. 23. Torkamani, Ali, EJ Topol, and NJ Schork. “Pathway Analysis of Seven Common Diseases Assessed by Genome-Wide Association.” Genomics 92.5 (2008): 265- 72. PubMed. 24. Medina, I., Montaner, D., Bonifaci, N., Pujana, M.A., Carbonell, J., Tarraga, J., Al-Shahrour, F. and Dopazo, J. (2009) Gene set-based analysis of polymorphisms: finding pathways or biological processes associated to traits in genome-wide association studies. Nucleic Acids Res, 37, W340-344. 25. Wang, K., Li, M. and Hakonarson, H. (2010) Analysing biological pathways in genome-wide association studies. Nat Rev Genet, 11, 843-854. 26. Zhong, Hua, et al. “Integrating Pathway Analysis and Genetics of Gene Expression for Genome-wide Association Studies.” American Journal of Human Genetics 86.4 (2010): 581-91. PubMed. Web. 30 August 2011. <http://www.ncbi.nlm.nih.gov/ pmc/articles/PMC2703874/?tool=pubmed> . 27. “Whole Genome Association Analysis Toolset.” PLINK. Web. 28 August 2011. <http:///pngu.mgh.harvard.edu/~purcell/plink/gplink.shtml>. 28. You L, Chen G, Zhao Y.P. Core signaling pathways and new therapeutic targets in pancreatic cancer. Chin Medical Journal 2010; 123 (9): 1210-1215. Doi: 10.3760/cma.j.issn. 0366-6999- 2010.09.020 29. “ICSNPathway”. Web. 28 August 2011. <http://icsnpathway.psych.ac.cn>
  • 25. 25