Apc-cluster

Job submission on APC cluster

Introduction

The job submission system is Torque/MAUI.

Jobs can be submitted via the qsub command to the unique queue which allows to use all available resources of the queue (see system configuration page ).

Recommendations

For a correct usage of cluster thank you to respect our recommendations.

QUEUE FURIOUS : for OpenMP or OpenMPI parallel jobs.

  • for a long (> 96h00) parallel job: not more than 64 cores.
  • no limit for short (<5h00) parallel jobs. Nevertheless, for the jobs to 8h00-12h00, it is preferable to execute these jobs at the end of the day to run overnight.
  • Each user of the queue should not take more than 40% of the available resources in this queue.

These rules ensure that a maximum of users can work on the cluster, however they can be flexible due to the system load (memory, cpu, etc.) on the cluster.

If you have an exceptional request please contact us via https://supportapc.in2p3.fr

Monitoring

  • Cluster status

You can find information of the cluster's loads on the web interface of our monitoring tool Ganglia at http://www.apc.univ-paris7.fr/ganglia/.

And you can also check the status of nodes by pbsnode command.

  • Job status

Use qstat to check job status:

  • Q: queued
  • C: completed
  • E: exiting
  • H: held
  • R: running
  • S: suspended.

You can also use /usr/local/maui/bin/showq to display information about active, eligible, blocked, and/or recently completed jobs.

All Maui diagnostic tools (showres, checkjob, checknode, diagnose) are available in

 /usr/local/maui/bin/, 

The Torque tools (tracejob, pbsnodes) are in /usr/local/bin/.

A simple job example

The qsub command takes a number of command line arguments which can also be specified via a job file script, together with the call to the actual computation code:

myjob.sh :

#!/bin/bash
#PBS -N myjobname
#PBS -o out_file
#PBS -e err_file
#PBS -q furious
#PBS -M email@address
#PBS -l nodes=1:ppn=4,mem=4gb,walltime=24:00:00
export SCRATCH="/scratch/$USER.$PBS_JOBID" 
echo "Hello world! "
cd my_working_dir
./my_executable_file 

The commande qsub myjob.sh will:

  • submit the job to the queue
  • execute it when 4 processors and 4Gb will be avaibable
  • create a temporary directory referenced in the $SCRATCH environment variable. For array jobs you have to use the following line :
export SCRATCH="/scratch/$USER.${PBS_JOBID//[\[\]]/}
  • run the executable file
  • print the standard output to the out_file file
  • print the standard error to the err_file file

Use qdel to remove a job from the queue.

Note that it's important to declare the total memory (no matter how many "ppn" you declare) of your job by setting the "mem" parameter, or else it will use the default value "8gb".

Interactive submission

One has the possibility to use the interactive job submission via the qsub -I command, which allows to:

  • ease the debugging
  • perform interactive computations on declared and reserved resources

Interactive submission with X Display :

  • qsub -I -X

MPI , openMP and hybrid jobs

MPI job sample script:

#!/bin/bash
#PBS -N myjobname
#PBS -o out_file
#PBS -e err_file
#PBS -q furious
#PBS -M email@address
#PBS -l nodes=5:ppn=2,mem=4gb,walltime=24:00:00
export SCRATCH="/scratch/$USER.$PBS_JOBID" 
export PATH=/usr/local/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/openmpi/lib/:/usr/local/openmpi/lib/openmpi/:$LD_LIBRARY_PATH
/usr/local/openmpi/bin/mpirun  my_mpi_job 

Warning: not '-mca plm rsh', new TCP protocol message with furious queue!!!

OpenMP sample script:

#!/bin/bash
#PBS -N myjobname
#PBS -o out_file
#PBS -e err_file
#PBS -q furious
#PBS -M email@address
#PBS -l nodes=1:ppn=16,mem=4gb,walltime=24:00:00
export SCRATCH="/scratch/$USER.$PBS_JOBID" 
export OMP_NUM_THREADS=32
./my_omp_job

Background job

By using #PBS nice option, and resquesting 0 cpus.

Job arrays

Job arrays are submitted through the -t option. The PBS_ARRAYID environment variable can be used as a job id number (or seed).

#!/bin/bash
#PBS -N myjobname
#PBS -o out_file
#PBS -e err_file
#PBS -q furious
#PBS -M email@address
#PBS -l nodes=1:ppn=16,mem=4gb,walltime=24:00:00
#PBS -t 0-99
export SCRATCH="/scratch/$USER.$PBS_JOBID" 
export SEED=$PBS_ARRAYID
./my_executable_file $SEED

Notification by mail

To receive a notification email: #PBS -m abe user@apc.univ-paris7.fr

Tells the scheduler to send you email based upon:

  • a mail is sent when the job is aborted by the batch system.
  • b mail is sent when the job begins execution.
  • e mail is sent when the job terminates.