Doc:Code-Aster-Cluster-Config
To use MPI with Code-Aster in CAELinux 2011, you don't need (and should not install) MPICH2 or anything else. Everything is already there. Code-Aster 11.0 is already compiled using openMPI libraries (and having several MPI libraries installed in the system may create configuration problems).
Personnally, this is the way I proceed, starting from 2 PC with a fresh install of CAELinux 2011 (even if using LiveDVD/liveUSB mode) so here is a small "How To" for you:
1) setup network to have interconnection: I use Network Manager to setup static IP adresses. set hostnames:
on machine 1:
sudo hostname caepc1
on machine 2:
sudo hostname caepc2
2) edit /etc/hosts of both machines to define host/ip relationships
sudo nano /etc/hosts
add such lines after 127.0.1.1 xxxx :
192.168.0.1 caepc1 192.168.0.2 caepc2
3) edit your configuration settings directly in /opt/aster110/etc/codeaster/aster-mpihosts
for example (use OpenMPI syntax):
caepc1 slots=1 caepc2 slots=1
4) optional: if you have more than 8Gb Ram per node or more than 16 cores in the cluster, edit also /opt/aster110/etc/codeaster/asrun to tune "interactif_memmax" = max memory per node and "interactif_mpi_nbpmax" = number of cores in the cluster
(optional) passwords: if using liveVD/liveUSB mode, you need to set a password for the default user caelinux.
so on each node, run in a terminal "passwd" (default password is empty) to set a new password
5) ssh setup: you need ssh login without passwords between the two hosts:
on first node, run
scp /home/caelinux/.ssh/id* caepc2:/home/caelinux/.ssh/ scp /home/caelinux/.ssh/authorized* caepc2:/home/caelinux/.ssh/ ssh-keyscan caepc1 >> /home/caelinux/.ssh/known_hosts ssh-keyscan caepc2 >> /home/caelinux/.ssh/known_hosts scp /home/caelinux/.ssh/known_hosts caepc2:/home/caelinux/.ssh/
6) setup a shared temp directory with NFS
on node 1
sudo mkdir /srv/shared_tmp sudo chmod a+rwx /srv/shared_tmp sudo nano /etc/exports
then add the following line and save:
/srv/shared_tmp *(rw,async)
then
sudo exportfs -a
Now create the mount point and mount the shared folder, run this on all nodes:
sudo mkdir /mnt/shared_tmp sudo chmod a+rwx /mnt/shared_tmp sudo mount -t nfs -o rw,rsize=8192,wsize=8192 caepc1:/srv/shared_tmp /mnt/shared_tmp
7) setup Aster config to use this shared temp directory:
nano /opt/aster110/eetc/codeaster/asrun
edit the line with "shared_tmp" as follows:
shared_tmp : /mnt/shared_tmp
then save
8) Open ASTK , go in server and refresh; create your Job,
select Options
ncpus=1 (no openMP) ,
mpi_nbcpu= total number of cores to use (nb_noeu*cores_per_host)
mpi_nbnoeud = number of compute nodes
And finally it should run on several nodes!!
Actually , the hard point is that you NEED to have shared tmp folder to run the jobs on a cluster.