Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/cloudMC-a-cloud-computing-map-reduce-implementation-for-radiotherapy/ruben-jimenez-and-hector-miras
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN JIMENEZ & HECTOR MIRAS at Big Data Spain 2012
1. CloudMC: A cloud computing
map-reduce implementation
for radiotherapy
Rubén Jiménez Marrufo
Héctor Miras del Río
Carlos Miras del Río
Carles Gomà Estadella
Big Data Spain
http://www.bigdataspain.org
Madrid, November 16th, 2012
2. Contents
Introduction
Radiotherapy
Monte Carlo simulations for radiation transport
Monte Carlo parallelization
Clustering vs. Cloud Computing
Cloud Computing for clinical radiation transport
CloudMC
DEMO START
Architecture
Map Reduce
Elasticity
How did Radarc help us?
Results
Is it reinventing the wheel?
Roadmap
DEMO RESULTS
Questions & Answers
3. Introduction
Héctor Miras del Río
Department of Medical Physics,
Virgen Macarena Hospital,
Seville, Spain
Rubén Jiménez Marrufo
R&D Division,
Icinetic TIC S.L.,
Seville, Spain
Carlos Miras del Río
R&D Division,
Wedoit Innovacion Tecnologica,
Seville, Spain
Carles Gomà
Centre for Proton Therapy,
Paul Scherrer Institute,
Villigen PSI, Switzerland
4. Introduction
Monte Carlo
Simulations
Cloud
Computing Radiotherapy
5. Radiotherapy
Radiotherapy: is the medical use
of ionizing radiation, generally as
part of cancer treatment to control
or kill malignant cells.
Radiotherapy treatment planning: is
the process for calculating the
radiation dose to be absorbed by an
object to be irradiated, prior to
radiotherapy.
8. Monte Carlo simulation for
radiation transport
Monte Carlo Simulations:
+👍 Gold standard algorithms for
radiation calculations
- 👍 Extremely computationally
intensive and very time-
consuming.
9. Monte Carlo parallelization
Parallelization: Execute
simultaneously one
simulation in several nodes
and merge the results.
Monte Carlo simulations are
highly parallelizable since
the primary events are
independent.
11. Cloud Computing for clinical
radiation calculations
Number
tCPU =
instances 100 cores cluster ≈ 20 000 €
100 h
n = 100
160 years of computing time in
Extra-
T(n) = an extra-small instance
small
1.44 h
0.0142 € / h
1000
Cost / plan
patients /
2€
year
12. CloudMC
CloudMC offers an implementation of map/reduce over Windows Azure
cloud computing platform, for the parallelization of MC simulations of
radiation therapy dose distribution.
Non-intrusive
Multi-application:
Penelope
Geant4
EGSnrc
Elasticity:
Resources are not reserved
1 hour simulation costs 1 hour
15. CloudMC: MapReduce
Sequence of actions when carrying out a MC simulation on n instances:
3. Parallel Execution
4. Reduce
5. End 2. Map
1. New Simulation
of
Every worker role: simulation
- When the web role reads the n
1. New - Generation end offromindependent
messagesaof of n initial saved on
Finished simulation metadata is
Reads message simulation,
1.Simulation metadata is the queue and
3. Parallel 5. End of
2.- Map
simulation seeds. on merges simulation files. Reduce
Resolver SQL the the n results
saveddownloads Azure.
execution
4.
simulation
SQL Azure.
2.Mapper: tothe “fragmented”
- Executes the storage. simulation
uploaded Modification of
simulation.
confignotices tohistories by the end
- Mail to divide the user of n.
- Simulation files are uploaded to the
3. the simulation arenthe storage.
-of Sends therolesthe proceed to
n-1 worker results to worker roles.
Provisioning of to scaled down.
Azure Storage. of simulation”
4. Sends an “end
-download themessages of “start”.
Sending of n results.
message.
16. CloudMC: Map
Most of MC applications for radiation transport simulation read the
configuration from textual files.
Input A:
Configuration Histories: 1015 Executable
Files
• Simulation parameters
• Histories count
• Geometry & materials files
Mapper: parametrized mapper to set
• … histories number and seeds in the input files
• MapReduce Parameters
Executable
Executable
Input B Executable
Executable
Mapped
Histories: 215 Executable
17. CloudMC: Reduce
The result of MC applications for radiation transport simulation are
dose, energy or any magnitude distribution files formatted in columns.
Executable Executable
Executable Executable
Executable Executable
Dose
Executable
Mapped Executable
distribution
Executable
files
Reducer: parametrized reducer to
combine columns depending on the
column type:
- Magnitude column Output
- Uncertainty column
18. CloudMC: MapReduce DSL
CloudMC uses a MapReduce DSL to read parameters to adapt Mapper
and Reducer to specific MC applications.
Mapper parameters Reducer parameters
19. CloudMC: Elasticity
Users choose the number of instances to use for each simulation.
CloudMC scales up worker role to run simulation and scales down
when it finishes.
Windows Azure Service Management allows roles scaling:
👍 REST API
👍 Based on XML config files
👍 Minimum of 1 instance
👍 Impossible to scale down
specific instances (Multi-tenant)
20. CloudMC: How did Radarc help us?
Service Management
UI
Formula Azure Services Provisioning
MapReduce
Entities Worker Roles
≃ 50% generated code: Factory
Repositories
• ASP.Net MVC 3 UI
• C# App Services
Cloud Hosted Services
• C# POCO Entities
• EF CodeFirst
• SQL Azure DB
Focus on domain core: Users &
User Simulation
map/reduce, Simulation
accounts Messages Queues files
provisioning, fault
tolerance, etc. SQL Azure Cloud Storage
21. CloudMC: Results
Case Study:
Simulation: 125I seed in ophtalmic
applicator.
Number of histories: 3·109
MC Code: PENELOPE, main program
PenEasy.
Results:
Worker instances size: extra-small
Clock time in 1 instance: 30 h
Clock time in 64 instances: 48 min
(speed up = 37x)
22. CloudMC: Results
Time vs number of instances study
T(n): Clock time for 1 simulation in
n instances.
tcpu: Overall time used only in the
simulation of n histories.
Dt0: Non-parallelizable time for 1
instance.
a: Non-parallelizable part of time
proportional to n.
23. CloudMC: Is it reinventing the wheel?
Why not using Amazon Elastic MapReduce?
(http://aws.amazon.com/es/elasticmapreduce)
• Our mapper and reducer were written for .Net
http://stackoverflow.com/questions/1190520/is-it-possible-to-write-map-
reduce-jobs-for-amazon-elastic-mapreduce-using-net
Why not using Hadoop On Azure?
(http://www.hadooponazure.com)
• First preview released on 2012.
• The cluster size must be reserved.
24. Roadmap
Testing with more MC applications: Geant4, EGSnrc, etc.
Support packages with specific MapReduce implementations
• Application to different domains
• Use of MEF to provide Mappers and Reducers in simulation
packages
SDK to develop specific MapReduce implementation packages.
• Visual Studio Templates could facilitate the development of
CloudMC packages
Enable multi-tenant environments
• Concurrent simulations require scaling down of specific
instances that is not possible on Windows Azure.
26. Thank you for your attention …
CloudMC soon available at:
https://cloudmontecarlo.cloudapp.net
hector.miras@gmail.com
@hmiras
rjimenez@icinetic.com
@rjimenez