SlideShare une entreprise Scribd logo
1  sur  20
CIS 210 February 2013
Sun/Oracle Grid Engine is:
 A quick and easy way to set up a multi-
  cluster system using existing hardware
 Oracle Grid Engine is the most widely
  deployed workload management solution in
  the industry and offers unmatched
  scalability. On top of a rich set of advanced
  scheduling capabilities and the flexibility to
  adapt to any computing environment and
  application workload, Oracle Grid Engine
  offers comprehensive support for the cloud
  computing model.
How to Install
 Via Webappl.blogspot.com
 http://webappl.blogspot.com/2011/05/ins
  tall-sun-grid-engine-sge-on-ubuntu.html
Install SGE on master node:
   Install SGE on master node:
    mpiuser@ub0:~$ sudo apt-get install
    gridengine-client gridengine-common
    gridengine-master gridengine-qmon
    gridengine-exec
    #remove gridengine-exec from the list if
    master node is not supposed to run jobs
    #during the installation, we need to set
    the cluster CELL name (such as
    „default‟)
Install SGE on other nodes:
 Install SGE on other nodes:
 mpiuser@ub1:~$ sudo apt-get install
  gridengine-client gridengine-exec

   The CELL name is set the same as that
    of the master node
Set SGE_ROOT and
SGE_CELL
   Set SGE_ROOT and SGE_CELL
    environment variables:
    $SGE_ROOT refers to the installation path
    of SGE
    $SGE_CELL is cell name which is „default‟
    on our machine
    Edit /etc/profile and /etc/bash.bachrc, add
    the following two lines
    export SGE_ROOT=/var/lib/gridengine
    #this is the path on our machines
    export SGE_CELL=default
    Source the script: source /etc/profile
Configure SGE with qmon
   Configure SGE with qmon (This section is
    modified from a note by Junjun Mao)
   Invoke qmon as superuser:
    mpiuser@ub0:~$ sudo qmon
   #On our machine, qmon failed to start due to
    missing fonts „-adobe-helvetica-…”
   # To solve the fonts problem:
    mpiuser@ub0:~$ sudo apt-get install xfs xfstt
    mpiuser@ub0:~$ sudo apt-get install t1-
    xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86-
    nonfree-syriac xfonts-75dpi xfonts-100dpi
    mpiuser@ub0:~$ sudo reboot #after reboot,
    the problem is gone
Configure hosts
 Configure hosts
 "Host Configuration" -> "Administration
  Host" -> Add master node and other
  administrative nodes
  "Host Configuration" -> "Submit Host" ->
  Add master node and other submit
  nodes
  "Host Configuration" -> "Execution Host"
  -> Add slave nodes
  ->Click on "Done" to finish
Configure the user
 Configure the user
 Add or delete users that are allowed to
  access SGE here. In this example, a user
  is added to an existing group and later this
  group will be allowed to submit jobs.
  Everything else is left as default values.
 "User Configuration" -> "Userset" ->
  Highlight userset "arusers" and click on
  "Modify" -> Input user name in
  "User/Group" field
  ->Click "Done" to finish
Configure the queue
   Configure the queue
    While Host Configuration deals what
    computing resources are available and
    User Configuration defines who have
    access to the resources, this Queue
    Control defines ways to connect hosts
    and users.
Queue Control
   "Queue Control" -> "Hosts" -> Confirm the execution
    hosts show up there.
    "Queue Control" -> "Cluster Queues" -> Click on
    "Add" -> Name the queue, add execution nodes to
    Hostlist;
    and
    "Use access" -> allow access to user group arusers;
    "General Configuration" -> Field "Slots" -> Raise the
    number to total CPU cores on slave nodes (ok to use
    a bigger number than actual CPU cores).
    "Queue Control" -> "Queue Instances" -> This is the
    place to manually assign hosts to queues, and
    control the state (active, suspend ...) of hosts.
Configure parallel environment
   Configure parallel environment
    "Queue Control" -> "Cluster Queues" -> Select a queue that will
    run parallel jobs -> Click on "Modify" -> "Parallel Environment" -
    > Click on icon "PE" below the right and left arrows -> Click on
    "Add" -> Name the PE, slots = 999, start_proc_args =
    $SGE_ROOT/mpi/startmpi.sh $pe_hostfile, stop_proc_args =
    $SGE_ROOT/mpi/stopmpi.sh, allocation_rule=$fill_up, check
    "Control slaves" to make this variable checked.
    Make sure the configured PE is loaded from "Available PE" to
    "Referenced PE".
    Confirm and close all config windows and open "Queue Control"
    -> "Cluster Queues" -> "Parallel Environment" again, the named
    PE should show up.
    Once created and linked to a queue, PE can be edited from
    "Queue Control" -> "PE" too.
Check whether sge hosts are
running properly
   Check whether sge hosts are running properly
    mpiuser@ub0:~$ qhost #it should list the system info from all
    nodes
    mpiuser@ub0:~$ qconf -sel #it should list the hostnames of
    nodes
    mpiuser@ub0:~$ qconf -sql #it should list the queues
    mpiuser@ub0:~$ ps aux | grep sge_qmaster | grep -v grep
    #check master daemon
    mpiuser@ub0:~$ ps aux | grep sge_execd | grep -v grep
    #check execute daemon
    mpiuser@ub1:~$ ps aux | grep sge_ execd | grep -v grep
    #check execute daemon
    #If sge_qmaster or sge_execd daemon is not running, try
    starting by service
    #mpiuser@ub1:~$ sudo service gridengine-master start
    #mpiuser@ub1:~$ sudo service gridengine-exec start
    …
    #Reboot node(s) if sge_qmaster or sge_execd fails to start
Run a test script
   Run a test script
    Make a script named „test‟ with content:
    #!/bin/bash
    ### Request Bourne shell as shell for job
    #$ -S /bin/bash
    ### Use current directory as working directory
    #$ -CWD
    ### Name the job:
    #$ -N test
    echo “Running environment:”
    env
    echo “=============================”
    ###end of script
Job Submission
   To submit the job: qsub test
    #a job id returned if successful
    Query the job status: qstat
    #If the job is running successfully, there
    will be two output files produced in the
    current working directory with name
    test.oXXX (the standard output) and
    test.eXXX (the standard error), where
    test is the job name and XXX is the job
    id.
Always check your logs
   Check log messages if error occurs
    mpiuser@ub0:~$ less
    /var/spool/gridengine/qmaster/messages
    #master node
    mpiuser@ub0:~$ less
    /var/spool/gridengine/execd/ub0/messag
    es #exec node
Possible Errors
   Question: My output file has a Warning: no
    access to tty (Bad file descriptor).Thus no
    job control in this shell.
    Answer: This warning is caused if you are
    using the tcsh or csh as shell for submitting
    job. It is safe to ignore this warning.
    Alternatively you can qsub -S /bin/bash to
    run your program in different shell or add a
    line of „#$ -S /bin/bash‟ in the job script.
Possible Errors
   Question: Master host failed to respond properly. Error message is “error: commlib
    error: access denied (client IP resolved to host name „ub0…‟. This is not identical to
    clients host name „ub0‟) error: unable to contact qmaster using port 6444 on host „ub0‟”
    Answer: Reboot the master node or install the SGE from source code on master node
    (Solutions not confirmed yet). It also could be due to that the utility of gethostname (full
    path is „/usr/lib/gridengine/gethostname‟ on our machines) returns a different hostname
    to that from running command „hostname -f‟. If this is the case (e.g., host having
    multiple network interfaces), create a file named „host_aliases‟ under
    „$SGE_ROOT/$SGE_CELL/common‟ and populate as follows,
    # cat host_aliases
    ub0 ub0.my.com ub0-grid
    ub1 ub1.my.com ub1-grid
    ub2 ub2.my.com ub2-grid
    ub3 ub3.my.com ub3-grid
    and then restart the gridengine daemon (see man page of sge_host_aliases for
    details). Check the aliases:
    mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0-grid
    mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0
    #both of them should return ub0
Sources
 http://manpages.ubuntu.com/manpages/
  /jaunty/man5/sge_conf.5.html
 http://webappl.blogspot.com/2011/05/ins
  tall-sun-grid-engine-sge-on-ubuntu.html
 http://pka.engr.ccny.cuny.edu/~jmao/nod
  e/49
 http://webappl.blogspot.com/2011/05/set
  ting-up-mpich2-cluster-with-ubuntu.html

Contenu connexe

Plus de Dan Morrill

Using Regular Expressions in Grep
Using Regular Expressions in GrepUsing Regular Expressions in Grep
Using Regular Expressions in GrepDan Morrill
 
Understanding the security_organization
Understanding the security_organizationUnderstanding the security_organization
Understanding the security_organizationDan Morrill
 
You should ask before copying that media
You should ask before copying that mediaYou should ask before copying that media
You should ask before copying that mediaDan Morrill
 
Cis 216 – shell scripting
Cis 216 – shell scriptingCis 216 – shell scripting
Cis 216 – shell scriptingDan Morrill
 
Understanding advanced persistent threats (APT)
Understanding advanced persistent threats (APT)Understanding advanced persistent threats (APT)
Understanding advanced persistent threats (APT)Dan Morrill
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewDan Morrill
 
What is cloud computing
What is cloud computingWhat is cloud computing
What is cloud computingDan Morrill
 
Social Media Plan for CityU of Seattle
Social Media Plan for CityU of SeattleSocial Media Plan for CityU of Seattle
Social Media Plan for CityU of SeattleDan Morrill
 
Case Studies In Social Media Chinese
Case Studies In Social Media ChineseCase Studies In Social Media Chinese
Case Studies In Social Media ChineseDan Morrill
 
Case Studies In Social Media
Case Studies In Social MediaCase Studies In Social Media
Case Studies In Social MediaDan Morrill
 
Turn On Tune In Step Out
Turn On Tune In Step OutTurn On Tune In Step Out
Turn On Tune In Step OutDan Morrill
 
Technology And The Future Of Management
Technology And The Future Of ManagementTechnology And The Future Of Management
Technology And The Future Of ManagementDan Morrill
 

Plus de Dan Morrill (13)

Using Regular Expressions in Grep
Using Regular Expressions in GrepUsing Regular Expressions in Grep
Using Regular Expressions in Grep
 
Understanding the security_organization
Understanding the security_organizationUnderstanding the security_organization
Understanding the security_organization
 
You should ask before copying that media
You should ask before copying that mediaYou should ask before copying that media
You should ask before copying that media
 
Cis 216 – shell scripting
Cis 216 – shell scriptingCis 216 – shell scripting
Cis 216 – shell scripting
 
Understanding advanced persistent threats (APT)
Understanding advanced persistent threats (APT)Understanding advanced persistent threats (APT)
Understanding advanced persistent threats (APT)
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
What is cloud computing
What is cloud computingWhat is cloud computing
What is cloud computing
 
Social Media Plan for CityU of Seattle
Social Media Plan for CityU of SeattleSocial Media Plan for CityU of Seattle
Social Media Plan for CityU of Seattle
 
BSIS Overview
BSIS OverviewBSIS Overview
BSIS Overview
 
Case Studies In Social Media Chinese
Case Studies In Social Media ChineseCase Studies In Social Media Chinese
Case Studies In Social Media Chinese
 
Case Studies In Social Media
Case Studies In Social MediaCase Studies In Social Media
Case Studies In Social Media
 
Turn On Tune In Step Out
Turn On Tune In Step OutTurn On Tune In Step Out
Turn On Tune In Step Out
 
Technology And The Future Of Management
Technology And The Future Of ManagementTechnology And The Future Of Management
Technology And The Future Of Management
 

Dernier

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 

Dernier (20)

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 

Working with Oracle/Sun Grid Engine

  • 2.
  • 3. Sun/Oracle Grid Engine is:  A quick and easy way to set up a multi- cluster system using existing hardware  Oracle Grid Engine is the most widely deployed workload management solution in the industry and offers unmatched scalability. On top of a rich set of advanced scheduling capabilities and the flexibility to adapt to any computing environment and application workload, Oracle Grid Engine offers comprehensive support for the cloud computing model.
  • 4. How to Install  Via Webappl.blogspot.com  http://webappl.blogspot.com/2011/05/ins tall-sun-grid-engine-sge-on-ubuntu.html
  • 5. Install SGE on master node:  Install SGE on master node: mpiuser@ub0:~$ sudo apt-get install gridengine-client gridengine-common gridengine-master gridengine-qmon gridengine-exec #remove gridengine-exec from the list if master node is not supposed to run jobs #during the installation, we need to set the cluster CELL name (such as „default‟)
  • 6. Install SGE on other nodes:  Install SGE on other nodes:  mpiuser@ub1:~$ sudo apt-get install gridengine-client gridengine-exec  The CELL name is set the same as that of the master node
  • 7. Set SGE_ROOT and SGE_CELL  Set SGE_ROOT and SGE_CELL environment variables: $SGE_ROOT refers to the installation path of SGE $SGE_CELL is cell name which is „default‟ on our machine Edit /etc/profile and /etc/bash.bachrc, add the following two lines export SGE_ROOT=/var/lib/gridengine #this is the path on our machines export SGE_CELL=default Source the script: source /etc/profile
  • 8. Configure SGE with qmon  Configure SGE with qmon (This section is modified from a note by Junjun Mao)  Invoke qmon as superuser: mpiuser@ub0:~$ sudo qmon  #On our machine, qmon failed to start due to missing fonts „-adobe-helvetica-…”  # To solve the fonts problem: mpiuser@ub0:~$ sudo apt-get install xfs xfstt mpiuser@ub0:~$ sudo apt-get install t1- xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86- nonfree-syriac xfonts-75dpi xfonts-100dpi mpiuser@ub0:~$ sudo reboot #after reboot, the problem is gone
  • 9. Configure hosts  Configure hosts  "Host Configuration" -> "Administration Host" -> Add master node and other administrative nodes "Host Configuration" -> "Submit Host" -> Add master node and other submit nodes "Host Configuration" -> "Execution Host" -> Add slave nodes ->Click on "Done" to finish
  • 10. Configure the user  Configure the user  Add or delete users that are allowed to access SGE here. In this example, a user is added to an existing group and later this group will be allowed to submit jobs. Everything else is left as default values.  "User Configuration" -> "Userset" -> Highlight userset "arusers" and click on "Modify" -> Input user name in "User/Group" field ->Click "Done" to finish
  • 11. Configure the queue  Configure the queue While Host Configuration deals what computing resources are available and User Configuration defines who have access to the resources, this Queue Control defines ways to connect hosts and users.
  • 12. Queue Control  "Queue Control" -> "Hosts" -> Confirm the execution hosts show up there. "Queue Control" -> "Cluster Queues" -> Click on "Add" -> Name the queue, add execution nodes to Hostlist; and "Use access" -> allow access to user group arusers; "General Configuration" -> Field "Slots" -> Raise the number to total CPU cores on slave nodes (ok to use a bigger number than actual CPU cores). "Queue Control" -> "Queue Instances" -> This is the place to manually assign hosts to queues, and control the state (active, suspend ...) of hosts.
  • 13. Configure parallel environment  Configure parallel environment "Queue Control" -> "Cluster Queues" -> Select a queue that will run parallel jobs -> Click on "Modify" -> "Parallel Environment" - > Click on icon "PE" below the right and left arrows -> Click on "Add" -> Name the PE, slots = 999, start_proc_args = $SGE_ROOT/mpi/startmpi.sh $pe_hostfile, stop_proc_args = $SGE_ROOT/mpi/stopmpi.sh, allocation_rule=$fill_up, check "Control slaves" to make this variable checked. Make sure the configured PE is loaded from "Available PE" to "Referenced PE". Confirm and close all config windows and open "Queue Control" -> "Cluster Queues" -> "Parallel Environment" again, the named PE should show up. Once created and linked to a queue, PE can be edited from "Queue Control" -> "PE" too.
  • 14. Check whether sge hosts are running properly  Check whether sge hosts are running properly mpiuser@ub0:~$ qhost #it should list the system info from all nodes mpiuser@ub0:~$ qconf -sel #it should list the hostnames of nodes mpiuser@ub0:~$ qconf -sql #it should list the queues mpiuser@ub0:~$ ps aux | grep sge_qmaster | grep -v grep #check master daemon mpiuser@ub0:~$ ps aux | grep sge_execd | grep -v grep #check execute daemon mpiuser@ub1:~$ ps aux | grep sge_ execd | grep -v grep #check execute daemon #If sge_qmaster or sge_execd daemon is not running, try starting by service #mpiuser@ub1:~$ sudo service gridengine-master start #mpiuser@ub1:~$ sudo service gridengine-exec start … #Reboot node(s) if sge_qmaster or sge_execd fails to start
  • 15. Run a test script  Run a test script Make a script named „test‟ with content: #!/bin/bash ### Request Bourne shell as shell for job #$ -S /bin/bash ### Use current directory as working directory #$ -CWD ### Name the job: #$ -N test echo “Running environment:” env echo “=============================” ###end of script
  • 16. Job Submission  To submit the job: qsub test #a job id returned if successful Query the job status: qstat #If the job is running successfully, there will be two output files produced in the current working directory with name test.oXXX (the standard output) and test.eXXX (the standard error), where test is the job name and XXX is the job id.
  • 17. Always check your logs  Check log messages if error occurs mpiuser@ub0:~$ less /var/spool/gridengine/qmaster/messages #master node mpiuser@ub0:~$ less /var/spool/gridengine/execd/ub0/messag es #exec node
  • 18. Possible Errors  Question: My output file has a Warning: no access to tty (Bad file descriptor).Thus no job control in this shell. Answer: This warning is caused if you are using the tcsh or csh as shell for submitting job. It is safe to ignore this warning. Alternatively you can qsub -S /bin/bash to run your program in different shell or add a line of „#$ -S /bin/bash‟ in the job script.
  • 19. Possible Errors  Question: Master host failed to respond properly. Error message is “error: commlib error: access denied (client IP resolved to host name „ub0…‟. This is not identical to clients host name „ub0‟) error: unable to contact qmaster using port 6444 on host „ub0‟” Answer: Reboot the master node or install the SGE from source code on master node (Solutions not confirmed yet). It also could be due to that the utility of gethostname (full path is „/usr/lib/gridengine/gethostname‟ on our machines) returns a different hostname to that from running command „hostname -f‟. If this is the case (e.g., host having multiple network interfaces), create a file named „host_aliases‟ under „$SGE_ROOT/$SGE_CELL/common‟ and populate as follows, # cat host_aliases ub0 ub0.my.com ub0-grid ub1 ub1.my.com ub1-grid ub2 ub2.my.com ub2-grid ub3 ub3.my.com ub3-grid and then restart the gridengine daemon (see man page of sge_host_aliases for details). Check the aliases: mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0-grid mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0 #both of them should return ub0
  • 20. Sources  http://manpages.ubuntu.com/manpages/ /jaunty/man5/sge_conf.5.html  http://webappl.blogspot.com/2011/05/ins tall-sun-grid-engine-sge-on-ubuntu.html  http://pka.engr.ccny.cuny.edu/~jmao/nod e/49  http://webappl.blogspot.com/2011/05/set ting-up-mpich2-cluster-with-ubuntu.html