Face-cluster

Job submission

Introduction

The job submission system is Torque/MAUI.

Jobs can be submitted via the qsub command to the unique queue which allows to use all available resources of the queue (see system configuration page ).

Recommendations

For a correct use of cluster thank you to respect our recommendations.

QUEUE QUIET : for sequential or array-jobs

  • For long (>12h00) sequential jobs up to 64 cores.
  • No limit for short (<5h00) sequential jobs. Nevertheless, for the jobs to 8h00-12h00, it is preferable to execute these jobs at the end of the day to run overnight.
  • The slot-limit for array-job is now 64 cores.
  • Each user of the queue should not take more than 40% of the available resources in this queue.

These rules ensure that a maximum of users can work on the cluster, however they can be flexible due to the system load (memory, cpu, etc.) on the cluster.

If you have an exceptional request please contact us via https://supportapc.in2p3.fr

Monitoring

  • Cluster status

You can find information of the cluster's loads on the web interface of our monitoring tool Ganglia at http://www.apc.univ-paris7.fr/ganglia

And you can also check the status of nodes by pbsnode command.

  • Job status

Use qstat to check job status:

  • Q: queued
  • C: completed
  • E: exiting
  • H: held
  • R: running
  • S: suspended.

You can also use /usr/local/maui/bin/showq to display information about active, eligible, blocked, and/or recently completed jobs.

A simple job example

The qsub command takes a number of command line arguments which can also be specified via a job file script, together with the call to the actual computation code:

myjob.sh :

#!/bin/bash
#PBS -N myjobname
#PBS -o out_file
#PBS -e err_file
#PBS -q quiet
#PBS -M email@address
#PBS -l nodes=1:ppn=4,mem=4gb,walltime=24:00:00
export SCRATCH="/scratch/$USER.$PBS_JOBID" 
echo "Hello world! "
cd my_working_dir
./my_executable_file 

The commande qsub myjob.sh will:

  • submit the job to the queue
  • execute it when 4 processors and 4Gb will be avaibable
  • create a temporary directory referenced in the $SCRATCH environment variable
  • run the executable file
  • print the standard output to the out_file file
  • print the standard error to the err_file file

Use qdel to remove a job from the queue.

Note that it's important to declare the total memory (no matter how many "ppn" you declare) of your job by setting the "mem" parameter, or else it will use the default value "4gb" which is set for the queue quiet.

Interactive submission

One has the possibility to use the interactive job submission via the qsub -I command, which allows to:

  • ease the debugging
  • perform interactive computations on declared and reserved resources

MPI , openMP and hybrid jobs

MPI job sample script:

#!/bin/bash
#PBS -N myjobname
#PBS -o out_file
#PBS -e err_file
#PBS -q quiet
#PBS -M email@address
#PBS -l nodes=5:ppn=2,mem=4gb,walltime=24:00:00
export SCRATCH="/scratch/$USER.$PBS_JOBID" 
export PATH=/usr/local/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/openmpi/lib/:/usr/local/openmpi/lib/openmpi/:$LD_LIBRARY_PATH
/usr/local/openmpi/bin/mpirun -mca plm rsh  my_mpi_job

OpenMP sample script:

#!/bin/bash
#PBS -N myjobname
#PBS -o out_file
#PBS -e err_file
#PBS -q quiet
#PBS -M email@address
#PBS -l nodes=1:ppn=16,mem=4gb,walltime=24:00:00
export SCRATCH="/scratch/$USER.$PBS_JOBID" 
export OMP_NUM_THREADS=32
./my_omp_job

Background job

By using #PBS nice option, and resquesting 0 cpus.

Job arrays

Job arrays are submitted through the -t option. The PBS_ARRAYID environment variable can be used as a job id number (or seed).

#!/bin/bash
#PBS -N myjobname
#PBS -o out_file
#PBS -e err_file
#PBS -q quiet
#PBS -M email@address
#PBS -l nodes=1:ppn=16,mem=4gb,walltime=24:00:00
#PBS -t 0-99
export SCRATCH="/scratch/$USER.$PBS_JOBID" 
export SEED=$PBS_ARRAYID
./my_executable_file $SEED

Matlab

#!/bin/bash
#PBS -N myjobname
#PBS -e err_file
#PBS -l nodes=1:ppn=1,mem=4gb,walltime=24:00:00
cd my_working_dir
./path_of_matlab_bin/matlab -nodesktop -nosplash  <input.m >output

More information on: http://www.clusterresources.com/torquedocs/2.1jobsubmission.shtml

Notification by mail

To receive a notification email: #PBS -m abe user@apc.univ-paris7.fr

Tells the scheduler to send you email based upon:

  • a mail is sent when the job is aborted by the batch system.
  • b mail is sent when the job begins execution.
  • e mail is sent when the job terminates.