2. Who?
Non-standard All things
solutions
wireless
Photography
Linux
Security
Tomasz Miklas Perl
a.k.a. Guiness
HA
tqm Gadgets
Scalability
@tomaszmiklas
Technology
K.I.S.S.
3. Why?
Originally to test statistical models…
Billions of independent passes (code executions)
Minimal I/O – just the parameters and hopefully some output
Clean recovery from hardware/software faults – bad things happen!
Use available resources in the most effective way
In other words… I need quite a lot CPU cycles :-)
…almost like BOINC!
6. Requirements #2
Support SMP if available
Run different tasks at the same time
Very basic monitoring and reporting
Client capabilities checking/recording (TO DO)
Task assignment based on capabilities (TO DO)
Graceful remote node shutdown (TO DO)
7. Implementation
Controller
Brain of the whole grid
Perl
CGI
No database :-)
8. Implementation
Runner
Creates unified platform to run the code
Debian based livecd built for the purpose…
Very easy, well documented!
Live USB and other options available
Single shell script – downloader for client app
9. Implementation
Grid client
Responsible for communications and task execution
Latest version downloaded at runner start-up
Identify host capabilities – use as much as possible
Run task with parameters, collect STDOUT
10. Actual task
Takes command line parameters, results to STDOUT
Can be another level of indirection (downloader, etc)
Usually it is a harness to make real app work
Just make sure LiveCD has all the libraries needed…
… or use static linking :-)
Perl apps can be treated with perlcc or PAR
11. Trivia
Brute-force md5 hashes
Kind of Map-Reduce algorithm - not quite there yet
Parameters:
1. Wanted hash
2. Variable part length
3. Plaintext prefix
4. Variable part characters
5. One more to say we want space in character set
12. Example
MD5 ("password") =
5f4dcc3b5aa765d61d8327deb882cf99
# use default characters
./md5brute.pl
5f4dcc3b5aa765d61d8327deb882cf99
6
pa
# use just lowercase alpha
./md5brute.pl
5f4dcc3b5aa765d61d8327deb882cf99
6
pa
abcdefghijklmnopqrstuvwxyz
# use just lowercase alpha and space
./md5brute.pl
5f4dcc3b5aa765d61d8327deb882cf99
6
pa
abcdefghijklmnopqrstuvwxyz
space
14. Hardware?
#2 – Home lab?!
Mixed CPUs
1-4 cores per box
NAS storage
(if needed)
Gigabit network
Noise level OK
but not for really long runs
15. Desktop virtualization
Bad Idea®
VirtualBox
• Linux host and Linux guest – disaster!
• LiveCD on bare metal - 250-300 sec/unit
• LiveCD on VirtualBox – 2300-3500 sec/unit
VMWare – not tested yet
Xen – seems to be very close to bare-metal!
16. Cooling is a hot topic!
5 kW cooling in a small room…
… one node too many and all is out of control!
17. How about ‘cloud’?
Let’s take Amazon EC2 as an example:
AMIs available
Easy to build your own (runner)
Scale at will, pay per use – if there is a need
to finish sooner than what can be done in-house
the cost of using cloud is well justified
Power/cooling/noise issues - outsourced!
18. EC2 on a budget
Standard on-demand vs. spot instances
• $0.085/h vs. $0.029/h - standard (1 EC2 CU)
• $0.680/h vs. $0.243/h - XL standard (8 EC2 CU)
• $0.170/h vs. $0.062/h - M high-cpu (5 EC2 CU)
• $0.680/h vs. $0.246/h - XL high-cpu (20 EC2 CU)
US - Virginia is cheaper than California or Europe
Find your best bang-for-the-buck before you start :)
19. Project summary
Time required:
- 3 work days (part time, between normal work)
from initial idea to 100% working grid
Code not public (still in PoC/test/dev phase)
Future plans:
- more features
- proper admin console (early beta in testing)!
- port runner to EC2 or other platform DONE!
- make it public (free hosted service coming soon)