Running a simulation on the Beowulf
Parallelization in one of the capabilities that Cactus offers. We are
going to explain here how to set up and run a numerical relativity
simulation on a cluster.
First of all you must have an account on
lobizon.utb.edu in
order to access the cluster resources. In principle you should be able
to log on to Lobizon directly from your local host. For this ask the
cluster administrator to grant you permission for your local host to
connect to Lobizon, otherwise your connection will be refused. If you
can not connect to Lobizon directly from your host try the following
configuration for your eth0 device. Locate the file called
ifcfg-eth0. It should contain:
DEVICE='eth0'
BOOTPROTO='static'
BROADCAST='206.254.3.255'
IPADDR=[your ip address]
NETMASK='255.255.255.0'
ONBOOT='yes'
GATEWAY='206.254.3.250'
TYPE='Ethernet'
USERCTL='no'
The important lines are those that specify broadcast, netmask and gateway.
To run a simulation you need just two files: the executable file and
the parameter file. The executable must support all thorns specified
in your parameter file. Copy both of them to an appropriate place in
your home directory using scp, i.e.
scp cactus_exe [username]@lobizon.utb.edu:~
scp parfile.par [username]@lobizon.utb.edu:~
Now log in to Lobizon.
ssh [username]@lobizon.utb.edu
In order to run the simulation you must copy the executable and the
parameter file to all nodes in the cluster. For this you use a perl
script called CopytoAll.pl which at this time is in a basic but
functional stage. Edit this file to specify the full path of your exe
and par files. The lines you must change are
$filename1="/home/[username]/parameter.par"
$filename2="/home/[username]/cactus_exe"
save your changes and run the script
perl CopytoAll.pl
now your exe and par files have been copied to all of the nodes.
We'll run the simulation in such a way that it is started in one of
the nodes and this node starts the processes in the remaining
nodes. For this to be accomplished you must login in one of the
nodes. You can identify the nodes by their name or IP address. Suppose
we are going to login into the node called n001.lobizon.utb.edu o
simply n001. Now type
rlogin n001
You will notice that you don't need a password to enter, this is the
normal behavior. If a password is prompted try to login another node
and inform of this to the system administrator. Once you are logged in
n001 go to directory /Cactus. Now you need to run lamboot in order
that mpirun can work. For this you must have a file containing the
list of the nodes you will use to run the simulation. For a better
explanation and syntax of this file refer to manual pages of
lamboot. For our purpose you can ask the system administrator to
provide this file. One you have it type
lamboot -vb nodes
where nodes is the file containing the list of nodes. If lamboot was
successful you will get message "topology done". Now you are ready to
start your simulation, type
mpirun -np N ./exe/cactus_exe parameter.par
your simulation should now be running.
It's important to notice that you don't need root access to run any
simulation, only users access is needed. It's strongly recommended
that you read the manual pages of mpirun and lamboot, just to get the
idea of what they do.