5 January 2002: Using SRE2003 daemons

A description of SRE2003 procedures for implementing daemons and data structures.


Contents:

1.  Introduction
2.  Overview
2.1     Synopsis of SRE_DMN_ procedures (including typical return)
3 . Detailed Description of the SRE_DMN procedures 
3.1     SRE_DMN_LAUNCH 
3.2.    SRE_DMN_OWN
3.3.    SRE_DMN_ASK
3.4.    SRE_DMN_LISTEN
3.5.    SRE_DMN_TIMELEFT
3.6.    SRE_DMN_RESPOND
3.7.    SRE_DMN_COMMAND
3.8.    SRE_DMN_KILL
4.0. Flags and Queues
4.1.    SRE2003 Flags
4.2.    SRE2003 Queues


                        ------------------  

1. Introduction

SRE2003 makes extensive use of "daemons". Daemons are threads running in
parallel to the main program. Some daemons are transient (such as request
and transaction daemons), while others are permanent (such as daemons
that handle caching and auditing).

In all cases, SRE2003 uses the SRE2003 daemon manager procedures to
handle the creation, completion, and (most importantly) the communications
between daemons.  This document describes these procedures.

In addition to "daemons", SRE supports "flags" and "queues". These are
described in section 4.


                        ------------------  
2. Overview

The SRE_DMN procedures:

 SRE_DMN_launch   -- launch a daemon
 SRE_DMN_own      -- create identifier information
 SRE_DMN_ask      -- used to send a request to the daemon
 SRE_DMN_listen   -- used by a daemon to wait for a request
 SRE_DMN_respond  -- used by the daemon to reply to a request 
 SRE_DMN_timeleft -- used by daemon to determine max wait time remaining 
 SRE_DMN_command  -- send a command to a daemon

All of these are written in REXX, and all of these are meant to be used as
procedures called by REXX programs and procedures.

The basic strategy is as follows:

 Let's assume your have a  "daemon" called MONITOR1 (say, it's a daemon
 that monitors the status of some resource). You also have your main program.

 1) In your main program, launch the MONITOR1 daemon using SRE_DMN_LAUNCH

 2) After whatever initializations it needs to do, MONITOR1 should
    start "listening" for requests. This is done using calls to
    SRE_DMN_LISTEN. Typically, this is done with a loop, so that
    after a finite wait with no request arriving (say, 1 minute), 
    MONITOR1 might do some housekeeping and then listen again.

 3) The main program would "request" information by using
    SRE_DMN_ASK to send a request to the daemon.  Typically,
    a finite waiting time is given -- so that if the daemon does
    not respond, some other action can be taken.
    The response is returned as the value of the SRE_DMN_ASK function.
    Thus, FROM THE POINT OF VIEW OF THE MAIN PROGRAM, these daemons
    are "called" through the intemediary of SRE_DMN_ASK

   There is one extra step: 
     the "main program" should set up a "own_id" that is used in
     calls to SRE_DMN_ASK. This is done by a call to SRE_DMN_OWN.   
        
 4)  When a request is issued (when the main program calls SRE_DMN_ASK),
     the SRE_DMN_LISTEN function will return the request info to 
     the daemon.  The daemon then uses this info to form a response, just
     as if the daemon were called as a procedure. 
        In a sense, the information returned by SRE_DMN_LISTEN is used 
        by the daemon, instead of the value of a "parse arg".
        
     Instead of using "return" to send this response back to the main
     program, the daemon uses SRE_DMN_RESPOND.

     There is one optional step:
        the daemon can call SRE_DMN_TIMELEFT to be sure that the main program 
        is still waiting for a response.

Note: SRE_DMN_COMMAND procedure is for the communication of control information
      between SRE_DMN functions


                        ------------------  


2.1 Synopsis of SRE_DMN_ procedures (including typical return)

   STUF=SRE_DMN_Launch(daemon_name,daemon_file,[verify,p1,...,p12]]
       returns thread_id ',' daemon_id
       or, if verify=1,   returns thread_id ',' daemon_id ',' verify_result

   STUF=SRE_DMN_own([extra1,justsay])
        returns client_id

   STUF=SRE_DMN_ask(daemon_name,param,waitmsec,[ownid,syscommand])
        returns  length_response ',' response

   STUF=SRE_DMN_listen(daemon_id,max_semwait)
       if request found, returns astamp ',' param
       otherwise ' '

   STUF=SRE_DMN_timeleft(astamp)
        returns ' ' if no time left, else 1/100th seconds left

   STUF=SRE_DMN_respond(astamp,results)
        returns 1, or an 'error ' message

   STUF=SRE_DMN_command(daemon_name,cmd_name,msec)
        used for inter-daemon communication

                        ------------------  


3. Detailed Description of the SRE_DMN procedures 


3.1 SRE_DMN_launch: 

Launch a daemon in a seperate thread, and return a daemon id

stuf=SRE_DMN_Launch(daemon_name,daemon_file,verify,param1,...]

  daemon_name -- a colloquial name to use for this daemon
               If it starts with / or \, then the daemon is "global"
               Otherwise, it's "process specific"

  daemon_file -- file containing the rexx code.
                Alternatively, if the first character of daemon_file is ':', then
                the rest of the string identifies a macrospace procedure to
                be used (as the daemon)

  verify    --   Optional.
                 If verify>0: after launching the daemon, wait verify
                 milliseconds to see if the daemon successfully launched.
                 If verify=1, then wait 10 seconds for verification

                Deprecated (use SRE_DMN_COMMAND for an alternative):
                    If verify='?', then just check to see if the daemon exists
                    (daemon_file is ignored, param1.. are ignored)
                    This check is against the daemon's semaphore.

   p1 .. p12  -- Optional. Up to 12 parameters  the daemon code can read
                The daemon should read this using
                parse arg dmn_id,p1,..,p12

Stuf contains either:
 Normal call:
    thread_id ',' daemon_id
       the thread_id is what thread the daemon is running under. The daemon_id is
       used by other SRE_DMN procedures such as SRE_DMN_ASK 
       (daemon_id is a slightly more efficient alternative to the daemon_name),
 or
    'error (launch) 'error_message
       if an error occured.

 Normal call, with verify>0
  thread_id ',' daemon_id ',' verify_result
      verify_result is the result of a verify call -- if it begins with 
      'error', then an error occurred
  Or, error message (as above)

 Verify=? call
   0 or 1 
      0=deamon does not exist, 
      1=daemon does exist
 

Notes:
  * the launched daemon should start always start with
        parse arg daemon_id,param1,...,param12
    the daemon_id is used in sre_dmn_listen and sre_dmn_timeleft

  * daemon_id structure is:  queue_name' 'sem_name' 'daemon_name

  * Hint: after calling SRE_DMN_LAUNCH, use something like:
       stat=sre_dmn_launch(..)
       if abbrev(stat,'error')=1 then do
          say "Error launching daemon: " stat
       end
       else do
         parse var stat tid ',' dmn_id 
       end

                -----------------------

3.2. SRE_DMN_OWN

Create an "Own-id" for use by a client. The own_id is typically with
calls to sre_dmn_ask.

Technically, SRE_DMN_OWN creates (or clears) a "thread specific" queue
and semaphore, and returns a structured string containing this
information.

Call as:    
   STUF=sre_dmn_own([extra1,justsay])

Where:

   extra1 (optional)
         Extra "id" information. This should be a single word (no embedded 
         spaces), typically unique to the caller. Thus, if several threads 
         may call a daemon, each thread should use its own "extra1" id. 
         This is optional -- its use can save a fraction of a second.

  justsay  (optional)
      Controls what is done
          not speciifed -- create the own_id
          1 - return, but do not create, client_id
              This is used when you know that the own_id was 
              created (by an earlier call to SRE_DMN_own)
          2 - return a "success flag" along with the client_id

STUF is equal to:
  If justsay<>2 :
    If success:
        client_id 
    If failure
        ' '
  If justsay =2 :
        success_flag ',' client_id
    success_flag is 0 if the client_id could not be created; 
                    1 if it could  or (if an error occurred). 

Notes:
  * error messages are written to pmprintf
  * the client_id structure is:
      own_queue' 'own_sem' 'extra1   
   where extra1 may be ' ' (if it was not specified)    

Examples: 

  id=sre_dmn_own()

  id=sre_dmn_own('THREAD_10')
  
  parse value sre_dmn_own('THREADXX',2) with status ',' id
       
                -----------------------

3.3. SRE_DMN_ASK

Make a request to a daemon, and wait for the response.

Usage:
   STUF= SRE_DMN_ask(daemon_name,param,waitmsec,ownid,syscommand)

where:
   daemon_name 
     The daemon's name (or it's daemon_id, as returned by SRE_DMN_LAUNCH)
     If you got it, use the daemon_id (it speeds things up a bit). 

   params
        An arbitrarily long string (not necessarily a text string)
        containing information to be passed to the daemon
        For now, only one parameter can be passed. If you want to
        send multiple values, you'll have to add your own seperator
        (such as a comma), and have the daemon parse them out.

   waitmsec 
        Number of milliseconds to wait for reply.
                If not specified, 90 seconds
                If 0, no wait (SRE_DMN_ASK will immediately return ' ')
                If -1, inifinite wait
                If -2, quick no wait (same as 0, with extra shortcuts
                
        Waitmsec can also have a second word: "E". If this is present,
        then a "two part" response is returned (status,value).
        Otherwise, just the value is returned.
                
       For example:
            '2000 E'  means "wait 2 seconds, return a two part response
             2000      means "wait 2 seconds, just return the value "
  
    ownid
        Optional. Own-id (as created by sre_dmn_own).
        Specifying own-id can speed things up a bit
        If ownid=0, then do NOT wait for a response (this
        is exactly the same as using waittime=-2)
        
  syscommand 
        Optional. Used in special cases (to talk to other sre_dmn_ 
        procedures). Not recommended for normal use (see 
        SRE_DMN_COMMAND instead)

stuff  is
  If waitmsec=-2 or = 0
     ' '
  If TwoPart Mode
    'error (ask) 'error_message -- if error occurred
    length_response ',' response -- value returned by the daemon
  otherwise (if not two part mode)
    ' '         : error
    response    : the value returned by the daemon (which might be ' ')


Notes:
   * sre_dmn_ask assumes that the daemon uses sre_dmn_listen!

                -----------------------

3.4. SRE_DMN_LISTEN

Used by daemons' to "listen for requests" submitted by SRE_DMN_ASK

Usage:
  STUF=  SRE_DMN_listen(daemon_id,max_semwait)

Where: 
  daemon_id 
    The daemon's "id". This is the first argument sent to the 
    daemon by SRE_DMN_LAUNCH.

 max_semwait 
    Milliseconds to listen for. If 0, check for a query, and immediately
    return

STUF equals:

   No pending request (i.e. timed out with no request detected):
        ' '
   Error occurred:
       'error 'error_message 

   A request was made:
       request_stamp ',' param
   where:
      request_stamp 
         contains"client id info'. This is used when the daemon calls
         sre_dmn_timeleft and sre_dmn_respond 
     param 
         the parameter provided in the call to sre_dmn_ask.
         Note that only 1 parameter can be sent. If you need to send more
         then one variable, it is up to you to figure out how
         to combine them (say, by using commas as delimiters).


Notes:
  * the request_stamp has the stucture:  queue semid id timedone syscommand 
    where timedone has the structure  julian_day:seconds.hsec     

  * sre_dmn_listen looks for "syscommands" that are used by SRE_DMN_COMMAND.

    For some syscommands (such as TID), a response is sent back 
    (to SRE_DMN_ASK), and the daemon is not notified.

     For the EXIT syscommand, SRE_DMN_LISTEN will cut the connection
     (kill the queue and semaphore used for inter-daemon communication),
     tell the daemon to EXIT, and exit -- it will NOT respond to SRE_DMN_ASK!

    Note that properly coded daemons should look for a  request_stamp of
    "EXIT" -- this means "exit asap".  


                -----------------------

3.5. SRE_DMN_TIMELEFT

Time remaining for this request (the request must have been
submitted by sre_dmn_ask)

Usage:
  STUF=sre_dmn_timeleft(astamp)

where:
  astamp
     the request_stamp provided by sre_dmn_listen


STUF equals
  If a timeout occurred (given the value of waitmsec in SRE_DMN_ASK).
      ' '
  Otherwise, the time remaining, as
      seconds.hsec

SRE_DMN_TIMELEFT can be called as many times as needed -- a daemon
can use this to know when time is running out (say, that 
it's a good idea to send a partial response).


                -----------------------

3.6. SRE_DMN_RESPOND

Used by a daemon to return a response to client (assuming
the client used sre_dmn_ask, and the daemon read the
request using sre_dmn_listen).

Usage:
   STUF= SRE_DMN_respond(astamp,results)

where:
   astamp
        the request-stamp returned by sre_dmn_listen
   results
        the response

STUF equals:
   If success
        1
    otherwise
       'error 'error_message

Note: if astamp equals 0, then SRE_DMN_RESPOND does nothing (i.e.; astamp will
      equal 0 when SRE_DMN_ASK used a 0 waittime).

      if astamp equals 'EXIT', SRE_DMN_RESPOND will immediately exit.
      This is actually a condition that should not occur --  a properly
      programmed daemon will look for a request-stamp of 'EXIT', and
      exit ASAP if it is found.

                -----------------------

3.7. SRE_DMN_COMMAND

Low level daemon manipulation commands (kill, disconnect, etc.)

Usage:
   STUFf=SRE_DMN_command(daemon_name,cmd_name,msec)

Where:
   daemon_name 
       the daemon's name, or it's daemon_id  (as provided by sre_dmn_launch)
     OR
       For the KILL command, the argument returned when the daemon was
       launched by SRE_DMN_LAUNCH (this is: thread_id,daemon_id).
   acommand 
       the command to perform
   msec 
      milliseconds to wait

STUF  depends on acommand

Valid commands are:
   CONNECTED  -- if daemon is connected (is able to recieve requests).
                 return 1. Else return 0
                 This works by checking the daemon's semaphore
   DISCONNECT -- disconnect the daemon (kill it's queue and semaphore)
                 This is a more cetain way of disconnecting
                 then the DIE command -- since DIE requires that
                 the daemon be listening (with SRE_DMN_LISTEN).
   VERIFY     -- return 1 (if daemon is still running). Otherwise, ' '
                 VERIFY is a more exacting test then CONNECTED.
   TID        -- return thread the daemon is running under
   DIE or EXIT -- disconnect, and "exit" (proper operation requires that
                  the daemon watch for an EXIT "request-stamp")
   KILL        -- DEPRECATED. We recommend using SRE_DMN_KILL instead
                  Kill the daemon --- disconnect and stop it's thread.
                  This is more drastic the EXIT, since it does
                  not depend on daemon to be using SRE_DMN_LISTEN.
                  However, it is not as clean -- it just tries to kill the 
                  thread without attempting to close things cleanly.
                  Note that to use KILL, the first argument most be the
                  result returned by SRE_DMN_LAUNCH, and NOT the daemon_name.



                                              ------------------  

3.8. SRE_DMN_KILL

An SRE_DMN command to kill a daemon.

This is a harsh command --

  -- it does NOT ask the daemon to shut down gracefully! 

  -- Instead, it kills the queue and semaphore, and attempts to
     kill the thread (that the daemon is running under) 

      ** Sometimes the thread won't die -- in which case the next time
         the deamon  tries to use it's daemon-id (say,in a SRE_DMN_LISTEN
         call), a "missing queue and semaphore" error will occur.

Usaage:
    astat=sre_dmn_kill(augmented_daemon_id)

where:
  augmented_daemon_id 
     the value of the argument returned by SRE_DMN_LAUNCH -- it includes
     the "thread_id," (without the "s) as a prefix to the normal "daemon-id".

Astat will be set to:
  'error n_id)message' if an error occurs
  '1' on success

Notes:
  * all other SRE_DMN procedures use ths "normal" daemon-id. If other
    SRE_DMN procedures are given an "augmented" daemon-id, they can
    detect (and remove) the  "thread_id," prefix.
  * if a "thread_id," (without the ") is not present, 
    SRE_DMN_KILL returns an error message
  * the normal daemon-id contains semaphore, queue, and name fields.


                                              ------------------  


4.0 Flags and Queues


4.1: SRE2003 Flags

Flags are essentially semaphores. They are useful means for threads to signal
the existence of a condition. For example, SRE2003 uses flags to signal
a shutdown -- the various daemons check a SHUTDOWN flag on a regular basis,
and exit if it's set equal to 1.

There are three kinds of flags supported by SRE2003.

 Normal:   these are meant to be read by a set of threads in a single process.
           However, they can be read by threads in other processes.
           They are also seriallized, hence can be set and read by several
           different threads.

  Global:   Same as normal, but open to all processes. That is, global flags
            are readily set and read by different processes, wherase normal
            flags are designed to be specific to a process.

   Local:  Only can be read by threads in the same process. Not seriallized, 
           hence should only be used in "set by one, read by many" 
           circumstances.
           The biggest advantage of local flags is that they are more quickly 
           read (they use the OS/2 environment, rather them event semaphores).

Three procedures are used to work with flags: SRE_FLAG_OPEN, SRE_FLAG_SET, 
and SRE_FLAG_CLOSE.

SRE_FLAG_OPEN : Create a flag.
   Syntax
      stat=SRE_FLAG_OPEN(flagname,state,pid)
   where
      flagname:  The name of the flag.  You will use this name whenever you 
                 check the flag's status.
                 Flagnames that begin with a / or a \ are GLOBAL flags.
                 Flagnames that begin with a # are LOCAL flags.
                 All other flagnames are treated as NORMAL flags.
      stats:  Optional. 0 or 1. The flag is set to this state.
              The default is 0
      pid:    Optional. The current process ID. 
              This is used to speed things up a tad.
              It is also used to set/read normal flags running under a 
              different process. If you are content to use normal flags 
              strictly for intra-process communication, and a few milliseconds
              here and there doesn't bother you, then you need not bother 
              specifying this parameter
   
  The returned value is either 1 (for success), or 'error 'error_message.


SRE_FLAG_SET: Read and set a flag
  Syntax:
    vv=sre_flag_set(flagname,state,pid,acum)
  
  where:

     flagname:   name of flag (that was used in sre_flag_open call)
        state:   optional. If specified, either a 0 or a 1.
                 If not specified: READ the flag's value
                 If specified: set the current value 
                 One can think of 0 as "off", and 1 as "on".

          pid:   optional. The process ID this flag refers to.
                         See SRE_FLAG_OPEN for the details.


  The returned value is either a 0 or 1. or 'error 'error_message
  If state is specified, the return value should be the same as the
  value of "state".


SRE_FLAG_CLOSE: close a flag
  Syntax:
     stat=sre_flag_close(flagname,pid)
  where flagname and pid are as described above.

  The returned value is either 1 (for success), or 'error 'error_message.


Notes on SRE_FLAG procedures:

  * If specifying a GLOBAL flag, be sure to include  the preceding / or \.
    Simiarly, for LOCAL flags, be sure to include the preceding #.

    Basically, the following is completely legal:
        stat1=sre_flag_open('MY_FLAG')
        stat2=sre_flag)open('\MY_FLAG')
        stat3=sre_flag_open('#MY_FLAG')
    In this example, three seperate flags will be created, each refererred to
    by its name (MY_FLAG, \MY_FLAG, or #MY_FLAG).

  * if a process stops, the "flags" are destroyed. Thus, "GLOBAL" 
    flags will disapper when the process that opened them dies.

   * Flagnames are case insensitive

   * Flagnames should NOT start with a number.
     They must NOT contain spaces.
     Basically, start with an alphabetical character, and use _, digits,
     and alphabetical characters elsewhere.

   * If you sre_flag_open a currently existing flag, the currently
     existing flag will be overwritten.

4.2: SRE2003 Queues

Queues are essentially FIFO queues -- they are very similar to 
OS/2 REXX queues. They are useful means of storing unique values, 
such as  pointers to available resources. They can also
be used as dynamic and seriallized counters.

As with flags, SRE2003 supports both "normal" and "global" queues -- with
global queues signified by names that start with a \ (or a /).

The queue procedures are:

SRE_Q_OPEN: create a simple queue
   Syntax:
      stat=sre_q_open(qname,pid)
   where:
       qname: the name of the queue. As noted, queuenames that begin with
              a \ (or a /) are "global" queues.
         pid: process id. Used just like the "pid" in SRE_FLAG_OPEN.
  
   Returns a 1 for success, or 'error 'error_message


SRE_Q_CLOSE: close a queue
   Syntax:
      stat=sre_q_close(qname,pid)
   where qname and pid are as noted under SRE_Q_OPEN.

   Returns a 1 for success, or 'error 'error_message
   
SRE_Q_SET: add and read contents of a queue

   Syntax:
     vv=sre_q_set(qname,action,value,pid)
   where:
        qname: as described above
       action:  what to do. For example: POP, PUSH, QUEUE, CT, EXISTS, POPALL
                See below for the details
       value:   the value to add, or a value to return on certain
                "empty queue" conditions
        pid:    as described above

   Return depends on the "action"
 
   The supported actions are:
     
         CT: Returns a count of the number of elements in the queue.
             If the queue is not defined, returns a 'error missing queue'

     EXISTS: Check to see if a queue exists. Returns a 0 if no, 1 if yes.

        POP: read the top element of the queue
             Returns it's value.
             If the queue is empty, the "value" argument is returned.    
             If "value" is not specified, returns ' '
             If an error occurs, returns '' 

       POPALL: read all elements in the queue
             Return these elements (in order read), using "value" as delimiter
             If the queue is empty, the "value" argument is returned.    
             If "value" is not specified, use ' ' as a delimiter
             If an error occurs, returns '' 

       PUSH: push the "value" onto the top of the queue. If a POP immediately follows
             a PUSH, the PUSHED value will be returned.
             Returns a 1 for success, or 'error 'error_message

      QUEUE: queue a "value" to the bottom of the queue.
             Returns a 1 for success, or 'error 'error_message


   Notes:
      *  Value can be any string (where numbers are treated as strings)
         This includes arbitarily long strings, which can contain any
         character.
   
      * Sorry, there is no "POP_FROM_BOTTOM" action

      * Queue names are case insensitive

      * Queue names should NOT start with a number.
        They must NOT contain spaces.
        Basically, start with an alphabetical character, and use _, digits,
        and alphabetical characters elsewhere.

       * If you sre_q_open a currently existing queue, the currently
         existing queue will be overwritten.

