NR Group logo

Numerical Relativity at UT Brownsville

Running a simulation on the Beowulf

Parallelization in one of the capabilities that Cactus offers. We are going to explain here how to set up and run a numerical relativity simulation on a cluster.

First of all you must have an account on lobizon.utb.edu in order to access the cluster resources. In principle you should be able to log on to Lobizon directly from your local host. For this ask the cluster administrator to grant you permission for your local host to connect to Lobizon, otherwise your connection will be refused. If you can not connect to Lobizon directly from your host try the following configuration for your eth0 device. Locate the file called ifcfg-eth0. It should contain:

DEVICE='eth0' 
BOOTPROTO='static'
BROADCAST='206.254.3.255'
IPADDR=[your ip address]
NETMASK='255.255.255.0'
ONBOOT='yes'
GATEWAY='206.254.3.250'
TYPE='Ethernet'
USERCTL='no'

The important lines are those that specify broadcast, netmask and gateway.

To run a simulation you need just two files: the executable file and the parameter file. The executable must support all thorns specified in your parameter file. Copy both of them to an appropriate place in your home directory using scp, i.e.

scp cactus_exe [username]@lobizon.utb.edu:~
scp parfile.par [username]@lobizon.utb.edu:~

Now log in to Lobizon.

ssh [username]@lobizon.utb.edu 

In order to run the simulation you must copy the executable and the parameter file to all nodes in the cluster. For this you use a perl script called CopytoAll.pl which at this time is in a basic but functional stage. Edit this file to specify the full path of your exe and par files. The lines you must change are

$filename1="/home/[username]/parameter.par"

$filename2="/home/[username]/cactus_exe"

save your changes and run the script

perl CopytoAll.pl

now your exe and par files have been copied to all of the nodes.

We'll run the simulation in such a way that it is started in one of the nodes and this node starts the processes in the remaining nodes. For this to be accomplished you must login in one of the nodes. You can identify the nodes by their name or IP address. Suppose we are going to login into the node called n001.lobizon.utb.edu o simply n001. Now type

rlogin n001

You will notice that you don't need a password to enter, this is the normal behavior. If a password is prompted try to login another node and inform of this to the system administrator. Once you are logged in n001 go to directory /Cactus. Now you need to run lamboot in order that mpirun can work. For this you must have a file containing the list of the nodes you will use to run the simulation. For a better explanation and syntax of this file refer to manual pages of lamboot. For our purpose you can ask the system administrator to provide this file. One you have it type

lamboot -vb nodes

where nodes is the file containing the list of nodes. If lamboot was successful you will get message "topology done". Now you are ready to start your simulation, type

mpirun -np N ./exe/cactus_exe parameter.par

your simulation should now be running.

It's important to notice that you don't need root access to run any simulation, only users access is needed. It's strongly recommended that you read the manual pages of mpirun and lamboot, just to get the idea of what they do.