This document discusses using cloud computing for protein structure prediction and gene expression data analysis. Protein structure prediction is a computationally intensive task that determines the 3D structure of proteins from their amino acid sequences. Cloud computing allows this task to be parallelized across multiple machines to reduce computational time. Gene expression profiling measures thousands of genes and is used for cancer prediction and diagnosis. Analyzing large gene expression datasets for cancer classification is solved using an extended classifier system on cloud infrastructure to further divide and parallelize the problem.
2. protein structure prediction
• Proteins are chains of amino acids joined
together by peptide bonds.
• Many conformations of this chain are possible
due to the rotation of the chain about each atom.
• Protein structure is these conformational changes
that are responsible for differences in the three
dimensional structure of proteins.
3. Why we are using cloud computing
• It require high computing capabilities and often
operate on large data- sets that cause extensive
I/O operations.
• Protein structure prediction is a computationally
intensive task that is fundamental to different
types of research in the life sciences
4. Benefits of protein structure
• Manually 3D structure determination is difficult, slow and
expensive
• Structure helps in the design of new drugs for the
treatment of diseases.
• The geometric structure of a protein cannot be
directly inferred from the sequence of genes that
compose its structure, but it is the result of
complex computations aimed at identifying the
structure that minimizes the required energy.
5.
6. • In the above figure the web portal enables
scientist not to worry about predictions task, all
work is done by cloud service.
Machines divides the pattern recognition problem
into three phases:
• initialization,
• classification,
• and a final phase.
these phases executes in parallel to reduce the
computational time of the prediction.
The prediction algorithm is then translated into a
task graph that is submitted to Aneka. Once the
task is completed, the middleware makes the
results available for visualization through the
portal.
7. Gene expression data analysis
• Gene expression profiling is the measurement of
the expression levels of thousands of genes at
once, Consequently, it is widely used for cancer
prediction.
• It is also used in medical diagnosis and drug
design.
8. Cancer
• Cancer is a disease characterized by uncontrolled
cell growth and proliferation. This behavior occurs
because genes regulating the cell growth mutate.
This means that all the cancerous cells contain
mutated genes.
• These uncontrolled growth develops different
types of tumors, In this context, gene expression
profiling is utilized to provide a more accurate
classification of tumors.
• The dimensionality of typical gene expression
datasets ranges from several thousands to over
tens of thousands of genes
9. • For these large classification is solved by
eXtended Classifier System(XCS) which has
been successfully utilized for classifying large
datasets.
• Cloud-CoXCS, is a machine learning
classification system for gene expression
datasets on the Cloud infrastructure. It extends
the XCS model by introducing a coevolutionary
approach.
• CoXCS divides the entire search space into sub
domains and employs the standard XCS
algorithm in each of these sub domains.