NR Group logo

Numerical Relativity at UT Brownsville

Checkpointing



Checkpointing is the procedure which allows you to save your simulation data periodically. If something unexpected happens that provokes the abnormal termination of your simulation, then you can use this checkpoints to relaunch the simulation not from the beginning but at later times, depending on how often you choose to do checkpointing.

To enable cactus checkpointing you must add a few lines to your parameter file. Let's call this file test_checkp.par. Additionally you'll need another parameter file to relaunch the simulation from a given checkpoint. Let's call this file test_recover.par. You can download this two files and look at them. They contain a brief explanation of the parameters needed to setup checkpointing.

Now let's see how this works. At this point you must have an executable file to run the Cactus WaveDemo. All you have to do is run cactus_wavedemo with test_checkp.par as parameter file. Go to your Cactus directory and type

./exe/cactus_wavedemo test_checkp.par

You'll notice that the output contains messages like this

INFO (IOFlexIO): ---------------------------------------------------------
INFO (IOFlexIO): Dumping periodic checkpoint at iteration 220
INFO (IOFlexIO): ---------------------------------------------------------
This means that checkpoint is really working. You can do an ls exe to verify existence of the checkpoint file. This file will have a name like cp.chkpt.it_1320.ieee. From the name you can guess that this checkpoint corresponds to iteration number 1320. Now stop the simulation. You can start it again from the iteration specified in the name of the checkpoint file. To do this start the cactus_wavedemo and now use test_recover.par as parameter file, namely

./exe/cactus_wavedemo test_recover.par

Notice that the simulation does not start from iteration zero but it does from iteration 1330, if we take the example above. Now that's it! you can save a lot of simulation time using checkpoint options.