SlideShare une entreprise Scribd logo
1  sur  18
Usage Patterns to Provision for ScientificExperimentation in Clouds Eran Chinthaka Withana and Beth Plale School of Informatics and Computing, Indiana University Bloomington, Indiana, USA. 2nd International Conference on Cloud Computing Technology and Science, Indianapolis, IN, US
Summary Doing Science in Cloud Improving Scientific Job Executions in Cloud Resources Role of Successful Predictions to Reduce Startup Overheads System Architecture Use of Reasoning Evaluation Discussion and Future Work 2
Clouds as a Complementary Solution to Grids for Science Issues with existing systems Batch oriented HPC resources with long queue wait times, even under moderate loads No access transparency  Quota system requires maximum resources to be known and approved in advance Advantages of using cloud resources Availability of “unlimited” compute resources the instant they are needed Pay-as-you-go model  eliminates up-front commitments Encourages scientists to budget for the resources they are willing to pay Issues with Clouds Slow interconnects  virtualization overhead and startup times Consumption based billing Emergence of new programming paradigms to exploit the advantages of Cloud resources 3
Challenges with Cloud Computing Resources Scheduling algorithms Focused on optimal utilization of relatively homogeneous grid or cluster resources Resources can be provisioned supporting user requirements in clouds Prediction Algorithms Different hardware configurations forces execution time predictions to factor non-uniformity of resources  4
Improving Scientific Job Executions in Cloud Resources Solution Space Meta-scheduler that uses historical information to anticipate future activity (AppleS, GRADS) Resource abstraction service (Nimrod/G) Reducing the impact of startup overheads, learning from user behavioral patterns, by predicting future jobs Talk outline Algorithm to predict future jobs by extracting user patterns from historical information Reduces the impact of high startup overheads for time-critical applications Use of knowledge-based techniques Zero knowledge or pre-populated job information consisting of connection between jobs Similar cases retrieved are used to predict future jobs, reducing high startup overheads Algorithm assessment  Two different workloads representing individual scientific jobs executed in LANL and set of workflows executed by three users 5
Use Case Suite of workflows can differ from domain to domain WRF (Weather Research and Forecasting) as upstream node Meteorologists will run pre-processing jobs to generate visualization of parameters In Agriculture, scientists will use for crop prediction Wild-fire propagation and prediction Generate visualizations for mobile phones using NCL scripts Atmospheric Scientists for optimal placement of wind farms User patterns reveal the sequence of jobs taking different users/domains into consideration Useful for a science gateway serving wide-range of mid-scale scientists 6 Weather Predictions Crop Predictions WRF Wind Farm Location Evaluations Wild Fire Propagation Simulation
Role of Successful Predictions to Reduce Startup Overheads Largest gain can be achieved when our prediction accuracy is high and setup time (s) is large with respect to execution time (t) r = probability of  successful prediction  (prediction accuracy) Percentage time  = reduction For simplicity, assuming equal job exec and startup times  Percentage time  = reduction 7
Relationship of Predictions to Execution Time Observations Percentage time reduction increases with accuracy of predictions Time reduction is reduced exponentially with increased work-to-overhead ratio Need to find the criticalpoint for a given situation Fixing the required percentage time reduction for a given t/s ratio and finding the required accuracy of predictions Cost of wrong predictions Depends on compute resource Percentage time  = reduction 8 Accuracy of Predictions =           total successful future job predictions / total predictions
Prediction Engine: System Architecture Prediction Retriever 9
Use of Reasoning Store and retrieve cases Steps Retrieval of similar cases Similarity measurement Use of thresholds Reuse of old cases Case adaptation Storage 10
Case Similarity Calculation Each case is represented using set of attributes Selected by finding the effect on goal variable (next job) 11
Evaluation1 Use cases Individual job workload1 40k jobs over two years from 1024-node CM-5 at Los Alamos National Lab Workflow use case 1: Parallel Workload Archive http://www.cs.huji.ac.il/labs/parallel/workload/  12
Evaluation: Average Accuracy of Predictions 13 Individual Jobs Workload Workflow Workload
Evaluation: Time Saved Amount of time that can be saved, if the resources are provisioned, when the job is ready to run Startup time Assumed to be 3mins (average for commercial providers) 14 Individual Jobs Workload Workflow Workload
Evaluation: Prediction Accuracies for Use Cases 15
Discussion and Future Work Accuracy  78% for individual jobs 96% for workflow workload Number of jobs required to make system stable depends on uniqueness and the distribution of unique applications Amount of time that can be saved, using future job prediction, is inversely proportional to t/s ratio More accurate methods to prune features and identify weights Evaluation of machine learning techniques as an alternative to knowledge-based systems Combining future job predictions with job reliability predictions to further improve throughput of job executions 16
Related Work [1] M. Armbrust et al., “Above the clouds: A berkeley view of cloud computing,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-28, 2009. [2] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.  [3] C. Catlett, “The philosophy of TeraGrid: building an open, extensible, distributed TeraScale facility,” in ACM International Symposium on Cluster Computing and the Grid. Published by the IEEE Computer Society, 2002. [4] J. S. Chase, D. E. Irwin, L. E. Grit, J. D. Moore, and S. Sprenkle, “Dynamic virtual clusters in a grid site manager.” in HPDC. IEEE Computer Society, 2003, pp. 90–103.  [5] R. J. Figueiredo, P. A. Dinda, and J. A. B. Fortes, “A case for grid computing on virtual machines,” in ICDCS ’03: Proceedings of the 23rd International Conference on Distributed Computing Systems. Washington, DC, USA: IEEE Computer Society, 2003, p. 550. [6] I. Foster, T. Freeman, K. Keahy, D. Scheftner, B. Sotomayer, and X. Zhang, “Virtual clusters for grid communities,” in CCGRID ’06: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid.  Washington, DC, USA: IEEE Computer Society, 2006, pp. 513–520. [7] K. Keahey, T. Freeman, J. Lauret, and D. Olson, “Virtual workspaces for scientific applications,” Journal of Physics: Conference Series, vol. 78, p. 012038 (5pp), 2007. [8] B. Sotomayor, K. Keahey, and I. Foster, “Overhead matters: A model for virtual resource management,” in VTDC ’06: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing. Washington, DC, USA: IEEE Computer Society, 2006, p. 5.   …………………………………………………………. [12] F. Berman et al., “Adaptive computing on the grid using apples,” IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 4, pp. 369–382, 2003.  [13] F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, J. Mellor-Crumme et al., “The GrADS project: Software support for high-level grid application development,” International Journal of High Performance Computing Applications, vol. 15, no. 4, p. 327, 2001. [14] R. Buyya, D. Abramson, and J. Giddy, “Nimrod/G: An architecture for a resource management and scheduling system in a global computational grid,” in hpc. Published by the IEEE Computer Society, 2000, p. 283. 17
Thank You !!

Contenu connexe

Tendances

Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Frederic Desprez
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Robert Grossman
 
A time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloudA time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloudNexgen Technology
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Robert Grossman
 
Volume 2-issue-6-1933-1938
Volume 2-issue-6-1933-1938Volume 2-issue-6-1933-1938
Volume 2-issue-6-1933-1938Editor IJARCET
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Robert Grossman
 
Big data and open access: a collision course for science
Big data and open access: a collision course for scienceBig data and open access: a collision course for science
Big data and open access: a collision course for scienceBeth Plale
 
Demand-driven Gaussian window optimization for executing preferred population...
Demand-driven Gaussian window optimization for executing preferred population...Demand-driven Gaussian window optimization for executing preferred population...
Demand-driven Gaussian window optimization for executing preferred population...IJECEIAES
 
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...Editor IJMTER
 
A time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloudA time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloudLeMeniz Infotech
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor NetworksOscar Corcho
 
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...ijdpsjournal
 
Energy-aware Task Scheduling using Ant-colony Optimization in cloud
Energy-aware Task Scheduling using Ant-colony Optimization in cloudEnergy-aware Task Scheduling using Ant-colony Optimization in cloud
Energy-aware Task Scheduling using Ant-colony Optimization in cloudLinda J
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingDIGVIJAY SHINDE
 
Handling Selfishness in Replica Allocation over a Mobile Ad-Hoc Network
Handling Selfishness in Replica Allocation over a Mobile Ad-Hoc NetworkHandling Selfishness in Replica Allocation over a Mobile Ad-Hoc Network
Handling Selfishness in Replica Allocation over a Mobile Ad-Hoc NetworkIJCERT
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computingijujournal
 
Task scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingTask scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingRamandeep Kaur
 

Tendances (20)

Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
A time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloudA time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloud
 
Nephele pegasus
Nephele pegasusNephele pegasus
Nephele pegasus
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
 
Volume 2-issue-6-1933-1938
Volume 2-issue-6-1933-1938Volume 2-issue-6-1933-1938
Volume 2-issue-6-1933-1938
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
 
Big data and open access: a collision course for science
Big data and open access: a collision course for scienceBig data and open access: a collision course for science
Big data and open access: a collision course for science
 
Demand-driven Gaussian window optimization for executing preferred population...
Demand-driven Gaussian window optimization for executing preferred population...Demand-driven Gaussian window optimization for executing preferred population...
Demand-driven Gaussian window optimization for executing preferred population...
 
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
 
A time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloudA time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloud
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...
 
Energy-aware Task Scheduling using Ant-colony Optimization in cloud
Energy-aware Task Scheduling using Ant-colony Optimization in cloudEnergy-aware Task Scheduling using Ant-colony Optimization in cloud
Energy-aware Task Scheduling using Ant-colony Optimization in cloud
 
Paper444012-4014
Paper444012-4014Paper444012-4014
Paper444012-4014
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computing
 
Handling Selfishness in Replica Allocation over a Mobile Ad-Hoc Network
Handling Selfishness in Replica Allocation over a Mobile Ad-Hoc NetworkHandling Selfishness in Replica Allocation over a Mobile Ad-Hoc Network
Handling Selfishness in Replica Allocation over a Mobile Ad-Hoc Network
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computing
 
Task scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingTask scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud Computing
 

Similaire à Usage Patterns to Provision for Scientific Experiments in Clouds

The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningRafael Ferreira da Silva
 
A Review: Metaheuristic Technique in Cloud Computing
A Review: Metaheuristic Technique in Cloud ComputingA Review: Metaheuristic Technique in Cloud Computing
A Review: Metaheuristic Technique in Cloud ComputingIRJET Journal
 
A Review on Cloud Computing.pdf
A Review on Cloud Computing.pdfA Review on Cloud Computing.pdf
A Review on Cloud Computing.pdfCharlie Congdon
 
Grid computing the grid
Grid computing the gridGrid computing the grid
Grid computing the gridJivan Nepali
 
An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...eSAT Publishing House
 
Energy efficient virtual machine (vm) migration in cloud data centers
Energy efficient virtual machine (vm) migration in cloud data centersEnergy efficient virtual machine (vm) migration in cloud data centers
Energy efficient virtual machine (vm) migration in cloud data centersDinesh Raj Paneru
 
Software aging prediction – a new approach
Software aging prediction – a new approach Software aging prediction – a new approach
Software aging prediction – a new approach IJECEIAES
 
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...Editor IJCATR
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores inside-BigData.com
 
Implementing Workload Postponing In Cloudsim to Maximize Renewable Energy Uti...
Implementing Workload Postponing In Cloudsim to Maximize Renewable Energy Uti...Implementing Workload Postponing In Cloudsim to Maximize Renewable Energy Uti...
Implementing Workload Postponing In Cloudsim to Maximize Renewable Energy Uti...IJERA Editor
 
(5 10) chitra natarajan
(5 10) chitra natarajan(5 10) chitra natarajan
(5 10) chitra natarajanIISRTJournals
 
PNNL April 2011 ogce
PNNL April 2011 ogcePNNL April 2011 ogce
PNNL April 2011 ogcemarpierc
 
RSDC (Reliable Scheduling Distributed in Cloud Computing)
RSDC (Reliable Scheduling Distributed in Cloud Computing)RSDC (Reliable Scheduling Distributed in Cloud Computing)
RSDC (Reliable Scheduling Distributed in Cloud Computing)IJCSEA Journal
 
Resource Allocation for Task Using Fair Share Scheduling Algorithm
Resource Allocation for Task Using Fair Share Scheduling AlgorithmResource Allocation for Task Using Fair Share Scheduling Algorithm
Resource Allocation for Task Using Fair Share Scheduling AlgorithmIRJET Journal
 
CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research IJECEIAES
 
Task Scheduling using Hybrid Algorithm in Cloud Computing Environments
Task Scheduling using Hybrid Algorithm in Cloud Computing EnvironmentsTask Scheduling using Hybrid Algorithm in Cloud Computing Environments
Task Scheduling using Hybrid Algorithm in Cloud Computing Environmentsiosrjce
 

Similaire à Usage Patterns to Provision for Scientific Experiments in Clouds (20)

The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
DIET_BLAST
DIET_BLASTDIET_BLAST
DIET_BLAST
 
A Review: Metaheuristic Technique in Cloud Computing
A Review: Metaheuristic Technique in Cloud ComputingA Review: Metaheuristic Technique in Cloud Computing
A Review: Metaheuristic Technique in Cloud Computing
 
A Review on Cloud Computing.pdf
A Review on Cloud Computing.pdfA Review on Cloud Computing.pdf
A Review on Cloud Computing.pdf
 
Grid computing the grid
Grid computing the gridGrid computing the grid
Grid computing the grid
 
An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...
 
Energy efficient virtual machine (vm) migration in cloud data centers
Energy efficient virtual machine (vm) migration in cloud data centersEnergy efficient virtual machine (vm) migration in cloud data centers
Energy efficient virtual machine (vm) migration in cloud data centers
 
Software aging prediction – a new approach
Software aging prediction – a new approach Software aging prediction – a new approach
Software aging prediction – a new approach
 
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
Ax34298305
Ax34298305Ax34298305
Ax34298305
 
Implementing Workload Postponing In Cloudsim to Maximize Renewable Energy Uti...
Implementing Workload Postponing In Cloudsim to Maximize Renewable Energy Uti...Implementing Workload Postponing In Cloudsim to Maximize Renewable Energy Uti...
Implementing Workload Postponing In Cloudsim to Maximize Renewable Energy Uti...
 
(5 10) chitra natarajan
(5 10) chitra natarajan(5 10) chitra natarajan
(5 10) chitra natarajan
 
PNNL April 2011 ogce
PNNL April 2011 ogcePNNL April 2011 ogce
PNNL April 2011 ogce
 
RSDC (Reliable Scheduling Distributed in Cloud Computing)
RSDC (Reliable Scheduling Distributed in Cloud Computing)RSDC (Reliable Scheduling Distributed in Cloud Computing)
RSDC (Reliable Scheduling Distributed in Cloud Computing)
 
Resource Allocation for Task Using Fair Share Scheduling Algorithm
Resource Allocation for Task Using Fair Share Scheduling AlgorithmResource Allocation for Task Using Fair Share Scheduling Algorithm
Resource Allocation for Task Using Fair Share Scheduling Algorithm
 
CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research CCCORE: Cloud Container for Collaborative Research
CCCORE: Cloud Container for Collaborative Research
 
J41046368
J41046368J41046368
J41046368
 
Task Scheduling using Hybrid Algorithm in Cloud Computing Environments
Task Scheduling using Hybrid Algorithm in Cloud Computing EnvironmentsTask Scheduling using Hybrid Algorithm in Cloud Computing Environments
Task Scheduling using Hybrid Algorithm in Cloud Computing Environments
 
N0173696106
N0173696106N0173696106
N0173696106
 

Plus de Eran Chinthaka Withana

Opensource development and apache software foundation
Opensource development and apache software foundationOpensource development and apache software foundation
Opensource development and apache software foundationEran Chinthaka Withana
 
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...Eran Chinthaka Withana
 
CBR Based Workflow Composition Assistant
CBR Based Workflow Composition AssistantCBR Based Workflow Composition Assistant
CBR Based Workflow Composition AssistantEran Chinthaka Withana
 

Plus de Eran Chinthaka Withana (7)

Cassandra At Wize Commerce
Cassandra At Wize CommerceCassandra At Wize Commerce
Cassandra At Wize Commerce
 
Opensource development and apache software foundation
Opensource development and apache software foundationOpensource development and apache software foundation
Opensource development and apache software foundation
 
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
 
Versioning for Workflow Evolution
Versioning for Workflow EvolutionVersioning for Workflow Evolution
Versioning for Workflow Evolution
 
Web Services in the Real World
Web Services in the Real WorldWeb Services in the Real World
Web Services in the Real World
 
Axis2 Landscape
Axis2 LandscapeAxis2 Landscape
Axis2 Landscape
 
CBR Based Workflow Composition Assistant
CBR Based Workflow Composition AssistantCBR Based Workflow Composition Assistant
CBR Based Workflow Composition Assistant
 

Usage Patterns to Provision for Scientific Experiments in Clouds

  • 1. Usage Patterns to Provision for ScientificExperimentation in Clouds Eran Chinthaka Withana and Beth Plale School of Informatics and Computing, Indiana University Bloomington, Indiana, USA. 2nd International Conference on Cloud Computing Technology and Science, Indianapolis, IN, US
  • 2. Summary Doing Science in Cloud Improving Scientific Job Executions in Cloud Resources Role of Successful Predictions to Reduce Startup Overheads System Architecture Use of Reasoning Evaluation Discussion and Future Work 2
  • 3. Clouds as a Complementary Solution to Grids for Science Issues with existing systems Batch oriented HPC resources with long queue wait times, even under moderate loads No access transparency Quota system requires maximum resources to be known and approved in advance Advantages of using cloud resources Availability of “unlimited” compute resources the instant they are needed Pay-as-you-go model eliminates up-front commitments Encourages scientists to budget for the resources they are willing to pay Issues with Clouds Slow interconnects virtualization overhead and startup times Consumption based billing Emergence of new programming paradigms to exploit the advantages of Cloud resources 3
  • 4. Challenges with Cloud Computing Resources Scheduling algorithms Focused on optimal utilization of relatively homogeneous grid or cluster resources Resources can be provisioned supporting user requirements in clouds Prediction Algorithms Different hardware configurations forces execution time predictions to factor non-uniformity of resources 4
  • 5. Improving Scientific Job Executions in Cloud Resources Solution Space Meta-scheduler that uses historical information to anticipate future activity (AppleS, GRADS) Resource abstraction service (Nimrod/G) Reducing the impact of startup overheads, learning from user behavioral patterns, by predicting future jobs Talk outline Algorithm to predict future jobs by extracting user patterns from historical information Reduces the impact of high startup overheads for time-critical applications Use of knowledge-based techniques Zero knowledge or pre-populated job information consisting of connection between jobs Similar cases retrieved are used to predict future jobs, reducing high startup overheads Algorithm assessment Two different workloads representing individual scientific jobs executed in LANL and set of workflows executed by three users 5
  • 6. Use Case Suite of workflows can differ from domain to domain WRF (Weather Research and Forecasting) as upstream node Meteorologists will run pre-processing jobs to generate visualization of parameters In Agriculture, scientists will use for crop prediction Wild-fire propagation and prediction Generate visualizations for mobile phones using NCL scripts Atmospheric Scientists for optimal placement of wind farms User patterns reveal the sequence of jobs taking different users/domains into consideration Useful for a science gateway serving wide-range of mid-scale scientists 6 Weather Predictions Crop Predictions WRF Wind Farm Location Evaluations Wild Fire Propagation Simulation
  • 7. Role of Successful Predictions to Reduce Startup Overheads Largest gain can be achieved when our prediction accuracy is high and setup time (s) is large with respect to execution time (t) r = probability of successful prediction (prediction accuracy) Percentage time = reduction For simplicity, assuming equal job exec and startup times Percentage time = reduction 7
  • 8. Relationship of Predictions to Execution Time Observations Percentage time reduction increases with accuracy of predictions Time reduction is reduced exponentially with increased work-to-overhead ratio Need to find the criticalpoint for a given situation Fixing the required percentage time reduction for a given t/s ratio and finding the required accuracy of predictions Cost of wrong predictions Depends on compute resource Percentage time = reduction 8 Accuracy of Predictions = total successful future job predictions / total predictions
  • 9. Prediction Engine: System Architecture Prediction Retriever 9
  • 10. Use of Reasoning Store and retrieve cases Steps Retrieval of similar cases Similarity measurement Use of thresholds Reuse of old cases Case adaptation Storage 10
  • 11. Case Similarity Calculation Each case is represented using set of attributes Selected by finding the effect on goal variable (next job) 11
  • 12. Evaluation1 Use cases Individual job workload1 40k jobs over two years from 1024-node CM-5 at Los Alamos National Lab Workflow use case 1: Parallel Workload Archive http://www.cs.huji.ac.il/labs/parallel/workload/ 12
  • 13. Evaluation: Average Accuracy of Predictions 13 Individual Jobs Workload Workflow Workload
  • 14. Evaluation: Time Saved Amount of time that can be saved, if the resources are provisioned, when the job is ready to run Startup time Assumed to be 3mins (average for commercial providers) 14 Individual Jobs Workload Workflow Workload
  • 16. Discussion and Future Work Accuracy 78% for individual jobs 96% for workflow workload Number of jobs required to make system stable depends on uniqueness and the distribution of unique applications Amount of time that can be saved, using future job prediction, is inversely proportional to t/s ratio More accurate methods to prune features and identify weights Evaluation of machine learning techniques as an alternative to knowledge-based systems Combining future job predictions with job reliability predictions to further improve throughput of job executions 16
  • 17. Related Work [1] M. Armbrust et al., “Above the clouds: A berkeley view of cloud computing,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-28, 2009. [2] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008. [3] C. Catlett, “The philosophy of TeraGrid: building an open, extensible, distributed TeraScale facility,” in ACM International Symposium on Cluster Computing and the Grid. Published by the IEEE Computer Society, 2002. [4] J. S. Chase, D. E. Irwin, L. E. Grit, J. D. Moore, and S. Sprenkle, “Dynamic virtual clusters in a grid site manager.” in HPDC. IEEE Computer Society, 2003, pp. 90–103. [5] R. J. Figueiredo, P. A. Dinda, and J. A. B. Fortes, “A case for grid computing on virtual machines,” in ICDCS ’03: Proceedings of the 23rd International Conference on Distributed Computing Systems. Washington, DC, USA: IEEE Computer Society, 2003, p. 550. [6] I. Foster, T. Freeman, K. Keahy, D. Scheftner, B. Sotomayer, and X. Zhang, “Virtual clusters for grid communities,” in CCGRID ’06: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid. Washington, DC, USA: IEEE Computer Society, 2006, pp. 513–520. [7] K. Keahey, T. Freeman, J. Lauret, and D. Olson, “Virtual workspaces for scientific applications,” Journal of Physics: Conference Series, vol. 78, p. 012038 (5pp), 2007. [8] B. Sotomayor, K. Keahey, and I. Foster, “Overhead matters: A model for virtual resource management,” in VTDC ’06: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing. Washington, DC, USA: IEEE Computer Society, 2006, p. 5. …………………………………………………………. [12] F. Berman et al., “Adaptive computing on the grid using apples,” IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 4, pp. 369–382, 2003. [13] F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, J. Mellor-Crumme et al., “The GrADS project: Software support for high-level grid application development,” International Journal of High Performance Computing Applications, vol. 15, no. 4, p. 327, 2001. [14] R. Buyya, D. Abramson, and J. Giddy, “Nimrod/G: An architecture for a resource management and scheduling system in a global computational grid,” in hpc. Published by the IEEE Computer Society, 2000, p. 283. 17