Apc-cluster

Context

System upgrade

The APC cluster has been created to replace the Arago cluster which has been operational since more than four years.

The operating system on APC cluster has been upgraded from Scientific Linux 6 to CentOS 7.

All system libraries are up to date, as so as softwares like the Torque Scheduler, the MAUI resource manager, the compilers, etc.

Automated cluster configuration

By using the configuration automation tool "Puppet", the configuration of APC cluster are now verified continuously and automatically.

Also during an installation of a new node or a reinstallation of an existed node, the configuration can be set up much faster than before.

Migration phase

The migration is separated into two phases.

  • Phase 1 (2016/03/7 - 2016/03/11)

Plan :

Creation of new master node (frontal machine) "apcclm".

Migrate all the compute nodes in the queue Furious to APC cluster, and they will be renamed "apccl01-apccl12".

Important :

1. The Arago cluster on SL6 will continue to work with the master node “apcclwn12” and its only queue quiet

2. The APC cluster will have only one queue Furious during this phase

3. The two clusters will share the same storage pool (/home, /workdir, etc.), so you don't need to copy data between the two clusters

4. Since all the libraries are upgraded on cluster APC, you will have to recompile everything including your softwares and binaries of your job.

Please pay attention that don't replace the actual version of your softs and binaries by the new one, or else you will not be able to work on Arago cluster again.

Check Alternate between two clusters for more details.

  • Phase 2 (Autumn 2016)

Plan :

Migrate the reste of the nodes which are actually in the queue Quiet to APC cluster