Checkpointing
Checkpointing is the procedure which allows you to save your
simulation data periodically. If something unexpected happens that
provokes the abnormal termination of your simulation, then you can use
this checkpoints to relaunch the simulation not from the beginning but
at later times, depending on how often you choose to do
checkpointing.
To enable cactus checkpointing you must add a few lines to your
parameter file. Let's call this file
test_checkp.par. Additionally
you'll need another parameter file to relaunch the simulation from a
given checkpoint. Let's call this file
test_recover.par. You can
download this two files and look at them. They contain a brief
explanation of the parameters needed to setup checkpointing.
Now let's see how this works. At this point you must have an
executable file to run the Cactus WaveDemo. All you have to do is run
cactus_wavedemo with test_checkp.par as parameter file. Go to your
Cactus directory and type
./exe/cactus_wavedemo test_checkp.par
You'll notice that the output contains messages like this
INFO (IOFlexIO): ---------------------------------------------------------
INFO (IOFlexIO): Dumping periodic checkpoint at iteration 220
INFO (IOFlexIO): ---------------------------------------------------------
This means that checkpoint is really working. You can do an ls exe to
verify existence of the checkpoint file. This file will have a name
like cp.chkpt.it_1320.ieee. From the name you can guess that this
checkpoint corresponds to iteration number 1320. Now stop the
simulation. You can start it again from the iteration specified in the
name of the checkpoint file. To do this start the cactus_wavedemo and
now use test_recover.par as parameter file, namely
./exe/cactus_wavedemo test_recover.par
Notice that the simulation does not start from iteration zero but it
does from iteration 1330, if we take the example above. Now that's it!
you can save a lot of simulation time using checkpoint
options.