libsigwatch: basic signal handling for Fortran

libsigwatch.a is a library of routines to provide simple signal watching for Fortran programs. This allows a minimal level of control of a running program from outside it, for example to tell it to checkpoint itself on receipt of a signal.

Version 1.0, 2011 February 2.

The project home page is http://purl.org/nxg/dist/libsigwatch, and it's hosted at code.nxg.name, where you can find downloads.

It is often useful to have some simple signal handling in larger Fortran programs, for example to handle the INT interrupt signal generated by ^C, and have a program shut itself down cleanly; or to handle one of the user signals USR1 or USR2, for example to have a program checkpoint itself, in case it crashes at some later stage. However, signal handling is tricky in Fortran (because the function that is registered as a signal handler is later called by value rather than by reference), so this library provides functions to make it easier.

Background

On Unix, there is a smallish set of signals which may be sent to a running process, which the process can either catch or ignore. For example, the INT signal is sent to a process by pressing the interrupt character (usually ^C), HUP is sent when a controlling terminal logs out, and KILL can be sent either by hand or by the system when it is forcing processes to die. The default action of the INT signal is to terminate a process, and by default the HUP signal is ignored. The KILL signal is one of those which cannot be caught or ignored, but always has its effect. There are also two signals, called USR1 and USR2 which are ignored by default, have no default meaning, and are provided for user convenience.

Each signal has a numeric value -- for example HUP is 1 and KILL is 9 -- and after finding a process's PID with the ps(1) command, you can send signals to it with the kill(1) command:

% kill -HUP <pid>

or

% kill -1 <pid>

Signals thus provide a limited mechanism for communicating with a running program. A useful way to use this is to have the program watch for signal USR1, say, and examine this by calling function getlastsignal at the end of a loop. If this returns a non-zero response, you might make your program checkpoint itself -- save its state for later restart -- in case the program crashes or has to be stopped for some reason.

For more details about signals, see the man pages for signal(3) or signal(7), depending on your platform.

Usage

A program prepares to receive signals by calling one of the watchsignalname or watchsignal functions, and calls getlastsignal at any point to retrieve the last signal which was sent to the process.

The arguments to watchsignalname are signame, a character string containing the name of the signal to watch for, and response, an integer which will be returned by getlastsignal after the specified signal has been caught. The signal names which the function recognises are those most likely to be useful, namely HUP, INT, USR1 and USR2.

The integer response is the number which will subsequently be returned by getlastsignal, after this signal is caught. If this response is passed as -1, the signal number associated with this name is what will be returned. Note that, although both HUP and INT have generally fixed numbers, the numbers associated with signals USR1 and USR2 are different on different unix variants.

If you need to catch another signal for some reason (make sure you understand the default behavour of the given signal first, however) you can give that signal as a number to the watchsignal function, and when that signal is later caught, the corresponding number is what will be returned by getlastsignal.

The getlastsignal function returns the response associated with the last signal which was caught, or zero if no signal has been caught so far, or since the last call to getlastsignal. That is, any caught signal is returned only once.

The installed signal handler does not re-throw the signal after it has caught it; this would defeat the purpose of this library for those signals, such as HUP and INT, for which the default action is to kill the process. Also, there is no way to tell if the signal was received by being re-thrown by another handler, installed after this one. If all of this matters to you, then this library cannot reasonably help you, and you have no hope but to learn to love the sigaction(2) manpage.

When installing the handler, these functions replace any previous signal handler. If that was a non-default one (for example, one put there by an MPI environment) this could potentially change the behaviour of your program in an unhelpful fashion. To warn you of this, these functions return +1 in this case; this is a success return value, but also a warning that you should understand what that previous signal handler was doing there.

The sigwatchversion function returns the version number of the library, as an integer formed from the version number by major_version * 1000 + minor_version, So that the version number 1.2, for example, would be returned as integer 1002.

Return values

Both watchsignalname and watchsignal return 0 if the signal watching was installed successfully, and -1 if there was an error. If there was a non-default signal handler already installed, it is replaced, but the routine returns 1 to warn you of this.

The function getlastsignal returns the response associated with the last signal caught, or zero if there has been no signal caught since the last time this function was invoked.

Example

The following Fortran program shows the library in use.

      program sigs
      
      implicit none

      integer i
      integer status

      integer watchsignal
      integer watchsignalname
      integer getlastsignal

* watch for signal 10 (which is USR1 on this platform)
      status = watchsignal(10)
      write(*,'("watchsignal 10:",i2)') status
* watch for HUP, too
      status = watchsignalname("HUP", 99)
      write(*,'("watchsignal HUP:",i2)') status

      do i=1,10
         call sleep(1)
         write (*,'("lastsig=", i2)') getlastsignal()
      enddo

      end

Then you can use the library like this:

% g77 -o libsigwatch-demo -lsigwatch libsigwatch-demo.f
% ./libsigwatch-demo & # start in the background ($! now has the PID)
[1] 15131
watchsignal 10: 0
watchsignal HUP: 0
% lastsig= 0
lastsig= 0
lastsig= 0
kill -HUP $!    # send the HUP signal to the process
lastsig=99      # saw it!
% lastsig= 0
...

You can also link against just sigwatch.o if necessary.

Downloading and installation

Download the distribution from here.

To configure, build and install, just use:

% ./configure
% make
% make install

That will install the software into /usr/local. If you want it to go somewhere else, then (as usual with ./configure), specify the alternative location as the argument to configure's --prefix option. See ./configure --help for more details.

This software is copyright 2003, 2005, 2011, Norman Gray. It is free software, released under the terms of the GNU General Public Licence.

Release notes

Version 1.0, 2011 February 2
Changed hosting; documentation preening. No functional differences from 0.2, but it's high time to make this release 1.0.
Version 0.2
Improved documentation
Version 0.1
Initial version.
Norman
2011 February 2