SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Jumping Bean 
Map Reduce With Bash 
(the power of the Unix philosophy)
Jumping Bean 
About Me 
● Solutions integrator at Jumping Bean 
– Developer & Trainer 
– Technologies 
● Java 
● PHP 
● HTML5/Javascript 
● Linux 
– What I am planning to do: 
● The Internet of things
Jumping Bean 
Map/Reduce with Bash 
● Purpose of this presentation is: 
– to demonstrate the power and flexibility of the Unix 
philosophy, 
– what awesome solutions can be created by using simple 
bash script and userland tools, 
– cool utilities and tools 
● The purpose is not: 
– to suggest that Map/Reduce is best done with bash 
– best given constraint – see business problem
Unix Philosophy 
“is a set of cultural norms and philosophical 
approaches to developing small yet capable 
Jumping Bean 
software” - Wikipedia
Jumping Bean 
Unix Philosophy 
“Early Unix developers were important in bringing 
the concepts of modularity and reusability into 
software engineering practice, spawning a 
'software tools' movement” - Wikipedia
Jumping Bean 
Business Problem 
● Nuclear Engineering department needs to run monte-carlo methods 
on data to calculate something to do with core temperature of nuclear 
reactors :), 
● Post-grad students need to run analysis as part of their course work, 
● Analysis can take days or weeks to run, 
● University has invested in 900 node cluster, 
● Cluster used for research when not used by students 
● Tool used for analysis is 
– written in Fortran. 
– single threaded, 
● No money for fancy-pants solution
Jumping Bean 
Business Problem 
● As-Is system 
– Professor uses laptop and desktop, 
– Manually starts application with simple script, 
– Start script x number of times where x=number of 
cores, 
– Waits for days, 
– Manually checks progress, 
– Not scalable to 900 nodes!
Jumping Bean 
Business Problem 
● Unknowns 
– How 900 node cluster set up i.e using any cluster software or virtualisation? 
● Open Stack? 
● Open Nebula? 
● KVM? 
– Tools available to IT department – I.e how they do deploys, monitoring, user 
management etc 
● Requirements 
– independence from IT department or experts for help, 
– Student & lecturer IT skills is limited to Fortran & some bash scripting skills, 
– Due to security concerns prevent IT staff from gaining access to research, 
● Keep it simple – Proof of concept
Jumping Bean 
What is Map/Reduce? 
● Programming model for 
– Processing and generating large datasets, 
– Using a parallel distribution algorithm, 
– On a cluster or set of distributed nodes 
● Popularised by Google and the advent of cloud computing 
● Apache Hadoop – full blown map/reduce framework. Used 
to analyse your social media data, “understand the 
customer” and by numerous agencies with 3 letter 
acronyms. 
– “Really we only trying to help you know yourself better”
Jumping Bean 
Map/Reduce Steps 
● Map – Master node takes large dataset and 
distributes it to compute nodes to perform 
analysis on. The compute nodes return a result, 
● Reduce – Gather the results of the compute 
nodes and aggregate results into final answer
Jumping Bean 
What we need 
● Controller node functions 
– to distribute data to nodes, 
– execute calculation functions 
– collect results 
● Management node functions 
– distribute application and scripts to compute nodes, 
● Compute node functions 
– Scripts to run the single threaded application in parallel on multi core processors 
● Security Requirements 
– Prevent system administrators from gaining access to core application , script or 
data
Jumping Bean 
Controller Functions 
● How to distribute files to a node (map), execute calculations & gather 
results (reduce)? 
– Use split to split input files, 
– Use ssh to distribute files, execute processes, 
● How to do this to multiple (900) nodes? 
– Use parallel ssh (pssh), paralle scp, 
● Issues: 
– Copying public key to 900 machines? 
– Give each student their own account? 
● Solution 
– Set up ldap authentication (password based) or 
– Include controller nodes root public key in compute node image, distribute 2ndary keys via scripts 
using pssh 
– Fancy pants – chef, ansible
Management Node Functions 
● Use parallel ssh to distribute scripts from 
management node to compute nodes, 
● Using Ansible or Chef could be a next 
evolutionary step to automate system 
maintenance 
Jumping Bean
Jumping Bean 
Compute Node Functions 
● Basically bash scirpt - How to parallelise single threaded 
application to use multiple cores on modern CPUs? 
● xargs 
– pass through list of input files, 
– -n set each iteration to run on one input file 
– -P set number of processes to start in parallel 
– Script waits for completion of processing & check output 
● GNU parallels 
– Can run commands in parallel using 1 or more hosts 
– More options for target input placement {}, string replacement 
– Can pass output as input to another process
Jumping Bean 
Compute/Controller Node 
● At end of compute node process either 
compute node pings controller node, 
● Controller node waits for pssh to return to carry 
out next step. I.e – reduce process or start next 
script with output from 1st being input to 2nd step, 
● Check for errors and reschedule failed 
computes,
Jumping Bean 
Security 
● Each student should have separate account 
– Linux mutli-user system. User home directory for storing files and results 
● Each user should be limited in resource usage 
– Simple 
● ulimit 
● psacct 
– Advanced 
● Cgroups 
● Namespaces 
● Students can execute but not read bash script file, special permissions 
– Use sudo or 
– Linux capabilities 
● setcap – eg setcap "cap_kill=+ep" script.sh
Jumping Bean 
Security 
● Limit the root user 
– Linux capabilities 
● setcap, capsh,pscap 
● Disable root account – grant CAP_SYS_ADMIN as 
needed, 
● /etc/security/capabilities.conf
Jumping Bean 
Resources 
● Parallel SSH, 
● Xargs, 
● GNU parallel, 
● cgroups, 
● namespaces, 
● Linux capabilities 
● Twitter - @mxc4 
● Gplus – Mark Clarke 
● Jumping Bean 
● Cyber Connect 
● Jozi Linux User Grou 
p 
● Jozi Java User Group 
● Maker Labs

Contenu connexe

Plus de Jumping Bean

Linux Containers & Docker
Linux Containers & DockerLinux Containers & Docker
Linux Containers & DockerJumping Bean
 
Introduction to Web Sockets
Introduction to Web SocketsIntroduction to Web Sockets
Introduction to Web SocketsJumping Bean
 
Secrets of a linux ninja Software Freedom Day 2013 Johannesburg, South Africa
Secrets of a linux ninja  Software Freedom Day 2013 Johannesburg, South AfricaSecrets of a linux ninja  Software Freedom Day 2013 Johannesburg, South Africa
Secrets of a linux ninja Software Freedom Day 2013 Johannesburg, South AfricaJumping Bean
 
M-Learning application development with open source
M-Learning application development with open sourceM-Learning application development with open source
M-Learning application development with open sourceJumping Bean
 
Introduction to AngularJS
Introduction to AngularJSIntroduction to AngularJS
Introduction to AngularJSJumping Bean
 
Introduction to Android Development
Introduction to Android DevelopmentIntroduction to Android Development
Introduction to Android DevelopmentJumping Bean
 
Glassfish An Introduction
Glassfish An IntroductionGlassfish An Introduction
Glassfish An IntroductionJumping Bean
 
IPv6 - Jozi Linux User Group Presentation
IPv6  - Jozi Linux User Group PresentationIPv6  - Jozi Linux User Group Presentation
IPv6 - Jozi Linux User Group PresentationJumping Bean
 
SELinux Johannesburg Linux User Group (JoziJUg)
SELinux Johannesburg Linux User Group (JoziJUg)SELinux Johannesburg Linux User Group (JoziJUg)
SELinux Johannesburg Linux User Group (JoziJUg)Jumping Bean
 

Plus de Jumping Bean (10)

Linux Containers & Docker
Linux Containers & DockerLinux Containers & Docker
Linux Containers & Docker
 
Introduction to Web Sockets
Introduction to Web SocketsIntroduction to Web Sockets
Introduction to Web Sockets
 
Secrets of a linux ninja Software Freedom Day 2013 Johannesburg, South Africa
Secrets of a linux ninja  Software Freedom Day 2013 Johannesburg, South AfricaSecrets of a linux ninja  Software Freedom Day 2013 Johannesburg, South Africa
Secrets of a linux ninja Software Freedom Day 2013 Johannesburg, South Africa
 
M-Learning application development with open source
M-Learning application development with open sourceM-Learning application development with open source
M-Learning application development with open source
 
Introduction to AngularJS
Introduction to AngularJSIntroduction to AngularJS
Introduction to AngularJS
 
Introduction to Android Development
Introduction to Android DevelopmentIntroduction to Android Development
Introduction to Android Development
 
Glassfish An Introduction
Glassfish An IntroductionGlassfish An Introduction
Glassfish An Introduction
 
Java logging
Java loggingJava logging
Java logging
 
IPv6 - Jozi Linux User Group Presentation
IPv6  - Jozi Linux User Group PresentationIPv6  - Jozi Linux User Group Presentation
IPv6 - Jozi Linux User Group Presentation
 
SELinux Johannesburg Linux User Group (JoziJUg)
SELinux Johannesburg Linux User Group (JoziJUg)SELinux Johannesburg Linux User Group (JoziJUg)
SELinux Johannesburg Linux User Group (JoziJUg)
 

Dernier

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyAnusha Are
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456KiaraTiradoMicha
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 

Dernier (20)

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 

Map Reduce with Bash - An Example of the Unix Philosophy in Action

  • 1. Jumping Bean Map Reduce With Bash (the power of the Unix philosophy)
  • 2. Jumping Bean About Me ● Solutions integrator at Jumping Bean – Developer & Trainer – Technologies ● Java ● PHP ● HTML5/Javascript ● Linux – What I am planning to do: ● The Internet of things
  • 3. Jumping Bean Map/Reduce with Bash ● Purpose of this presentation is: – to demonstrate the power and flexibility of the Unix philosophy, – what awesome solutions can be created by using simple bash script and userland tools, – cool utilities and tools ● The purpose is not: – to suggest that Map/Reduce is best done with bash – best given constraint – see business problem
  • 4. Unix Philosophy “is a set of cultural norms and philosophical approaches to developing small yet capable Jumping Bean software” - Wikipedia
  • 5. Jumping Bean Unix Philosophy “Early Unix developers were important in bringing the concepts of modularity and reusability into software engineering practice, spawning a 'software tools' movement” - Wikipedia
  • 6. Jumping Bean Business Problem ● Nuclear Engineering department needs to run monte-carlo methods on data to calculate something to do with core temperature of nuclear reactors :), ● Post-grad students need to run analysis as part of their course work, ● Analysis can take days or weeks to run, ● University has invested in 900 node cluster, ● Cluster used for research when not used by students ● Tool used for analysis is – written in Fortran. – single threaded, ● No money for fancy-pants solution
  • 7. Jumping Bean Business Problem ● As-Is system – Professor uses laptop and desktop, – Manually starts application with simple script, – Start script x number of times where x=number of cores, – Waits for days, – Manually checks progress, – Not scalable to 900 nodes!
  • 8. Jumping Bean Business Problem ● Unknowns – How 900 node cluster set up i.e using any cluster software or virtualisation? ● Open Stack? ● Open Nebula? ● KVM? – Tools available to IT department – I.e how they do deploys, monitoring, user management etc ● Requirements – independence from IT department or experts for help, – Student & lecturer IT skills is limited to Fortran & some bash scripting skills, – Due to security concerns prevent IT staff from gaining access to research, ● Keep it simple – Proof of concept
  • 9. Jumping Bean What is Map/Reduce? ● Programming model for – Processing and generating large datasets, – Using a parallel distribution algorithm, – On a cluster or set of distributed nodes ● Popularised by Google and the advent of cloud computing ● Apache Hadoop – full blown map/reduce framework. Used to analyse your social media data, “understand the customer” and by numerous agencies with 3 letter acronyms. – “Really we only trying to help you know yourself better”
  • 10. Jumping Bean Map/Reduce Steps ● Map – Master node takes large dataset and distributes it to compute nodes to perform analysis on. The compute nodes return a result, ● Reduce – Gather the results of the compute nodes and aggregate results into final answer
  • 11. Jumping Bean What we need ● Controller node functions – to distribute data to nodes, – execute calculation functions – collect results ● Management node functions – distribute application and scripts to compute nodes, ● Compute node functions – Scripts to run the single threaded application in parallel on multi core processors ● Security Requirements – Prevent system administrators from gaining access to core application , script or data
  • 12. Jumping Bean Controller Functions ● How to distribute files to a node (map), execute calculations & gather results (reduce)? – Use split to split input files, – Use ssh to distribute files, execute processes, ● How to do this to multiple (900) nodes? – Use parallel ssh (pssh), paralle scp, ● Issues: – Copying public key to 900 machines? – Give each student their own account? ● Solution – Set up ldap authentication (password based) or – Include controller nodes root public key in compute node image, distribute 2ndary keys via scripts using pssh – Fancy pants – chef, ansible
  • 13. Management Node Functions ● Use parallel ssh to distribute scripts from management node to compute nodes, ● Using Ansible or Chef could be a next evolutionary step to automate system maintenance Jumping Bean
  • 14. Jumping Bean Compute Node Functions ● Basically bash scirpt - How to parallelise single threaded application to use multiple cores on modern CPUs? ● xargs – pass through list of input files, – -n set each iteration to run on one input file – -P set number of processes to start in parallel – Script waits for completion of processing & check output ● GNU parallels – Can run commands in parallel using 1 or more hosts – More options for target input placement {}, string replacement – Can pass output as input to another process
  • 15. Jumping Bean Compute/Controller Node ● At end of compute node process either compute node pings controller node, ● Controller node waits for pssh to return to carry out next step. I.e – reduce process or start next script with output from 1st being input to 2nd step, ● Check for errors and reschedule failed computes,
  • 16. Jumping Bean Security ● Each student should have separate account – Linux mutli-user system. User home directory for storing files and results ● Each user should be limited in resource usage – Simple ● ulimit ● psacct – Advanced ● Cgroups ● Namespaces ● Students can execute but not read bash script file, special permissions – Use sudo or – Linux capabilities ● setcap – eg setcap "cap_kill=+ep" script.sh
  • 17. Jumping Bean Security ● Limit the root user – Linux capabilities ● setcap, capsh,pscap ● Disable root account – grant CAP_SYS_ADMIN as needed, ● /etc/security/capabilities.conf
  • 18. Jumping Bean Resources ● Parallel SSH, ● Xargs, ● GNU parallel, ● cgroups, ● namespaces, ● Linux capabilities ● Twitter - @mxc4 ● Gplus – Mark Clarke ● Jumping Bean ● Cyber Connect ● Jozi Linux User Grou p ● Jozi Java User Group ● Maker Labs