Error handling in RPG: building programs that fail gracefully

Most RPG programs are written to handle the happy path. The file exists, the record is found, the calculation succeeds, the API responds. When something goes wrong, many programs either crash with a cryptic system message, silently continue with bad data, or dump the user into a raw system error screen that tells them nothing useful.

Good error handling is not defensive paranoia — it is professionalism. A program that fails gracefully tells the user what went wrong in plain language, logs enough detail for a developer to diagnose the problem, cleans up any resources it was holding, and exits cleanly. This post covers the techniques to make that happen in RPGLE.

The two error handling models in RPG

RPGLE has two distinct approaches to error handling that you will use in different contexts:

The *PSSR subroutine and INFSR — the traditional RPG error handling model. A special subroutine that the runtime calls automatically when an unhandled error occurs. Still valid and still used, especially in older codebases.

The %ERROR built-in and ERRCOD parameter — operation-level error trapping. You tell a specific operation to not crash on error, then check %ERROR afterwards. Fine-grained control without needing a global error handler.

In modern free-format RPG, you will typically use a combination: %ERROR for operations where you expect errors might occur, and *PSSR as a safety net for anything unexpected.

%ERROR and the ERRCOD parameter

For file operations and some built-ins, adding (e) to the operation code tells RPG to not crash if the operation fails — set %ERROR instead:

**FREE

// Read with error handling
read(e) CUSTMAST CustRecord;

if %error;
  // Read failed — handle it
  dsply 'Error reading customer file';
  *inlr = *on;
  return;
endif;

if %eof(CUSTMAST);
  // End of file — normal condition, not an error
  leave;
endif;

The (e) extender works on most file operations: read(e), write(e), update(e), delete(e), chain(e), open(e), close(e).

For operations that do not support (e), use the ERRCOD data structure parameter where available, or wrap the call in a monitor block (see below).

%STATUS — what went wrong

After %ERROR is set, %STATUS tells you what the error was. It returns a numeric code:

1211 — tried to read a file that is not open
1218 — record locked by another job
1221 — duplicate key on a write
1222 — record not found on a CHAIN
1251 — end of file on a read
1255 — file is full

chain(e) CustomerID CUSTMAST CustRecord;

if %error;
  select;
    when %status = 1218;
      // Record locked — another job has it
      dsply 'Customer record is in use. Please try again.';
    when %status = 1222;
      // Not found — CHAIN sets %found to *off too
      dsply 'Customer not found: ' + %char(CustomerID);
    other;
      // Unexpected error
      dsply 'Unexpected error: ' + %char(%status);
  endsl;
  *inlr = *on;
  return;
endif;

Monitor blocks — structured exception handling

The monitor block is the modern equivalent of the old *PSSR subroutine for handling errors in a specific section of code. It works like try/catch in other languages.

monitor;
  // Code that might fail
  chain CustomerID CUSTMAST CustRecord;
  if not %found(CUSTMAST);
    dsply 'Customer not found';
    return;
  endif;

  update CustRecord;

on-error 1218;
  // Specifically handle record lock
  dsply 'Record locked by another user';

on-error 1221;
  // Duplicate key
  dsply 'Duplicate record — customer already exists';

on-error *FILE;
  // Any other file error
  dsply 'File error: ' + %char(%status);

on-error *ALL;
  // Catch-all for anything not caught above
  dsply 'Unexpected error occurred: ' + %char(%status);
  // Log, clean up, and exit
endmon;

on-error clauses match by status code, by error class (*FILE, *PROGRAM, *CANCEL), or *ALL to catch everything. The first matching clause runs — just like a select/when structure.

Monitor blocks can be nested. The inner block handles what it can; anything it does not handle bubbles up to the outer block.

The *PSSR subroutine — the safety net

*PSSR is a special subroutine that RPG calls automatically when an unhandled program error occurs — anything not caught by a monitor block or (e) extender. It is your last line of defence.

**FREE
ctl-opt dftactgrp(*no) actgrp(*caller);

// ... main program logic ...

// The safety net — runs on any unhandled error
begsr *PSSR;
  // %status is set to the error code
  // %routine gives the name of the routine where the error occurred
  ErrMsg = 'Program error in ' + %routine +
           ': status ' + %char(%status);

  // Log to a file or send a message
  exec sql
    INSERT INTO MYLIB.ERRORLOG
      (LogTime, Program, Routine, ErrStatus, ErrMessage)
    VALUES(CURRENT_TIMESTAMP,
           'MYPGM',
           :ErrRoutine,
           :ErrStatus,
           :ErrMsg);

  // Send an escape message to the caller
  // (this ends the program and tells the caller something went wrong)
  *inlr = *on;
  return;
endsr;

%routine returns the name of the procedure or subroutine where the error occurred — invaluable for diagnosing which part of a large program failed.

INFSR — file-specific error handling

Each file in an RPG program can have its own error handling subroutine, defined with the INFSR keyword on the file declaration:

dcl-f CUSTMAST usage(*update) keyed infsr(CustFileErr);
dcl-f ERRORLOG usage(*output);

// ... main logic ...

begsr CustFileErr;
  // Called automatically when CUSTMAST has an error
  // %status is set to the file status code
  if %status = 1218;
    // Lock wait — set a flag and return to the operation
    RecordLocked = *on;
    return;
  endif;

  // For anything else, log and end
  ErrMsg = 'CUSTMAST error: ' + %char(%status);
  // log it...
  *inlr = *on;
  return;
endsr;

INFSR gives you file-level granularity — useful when different files in the same program need different error responses.

Sending messages to the caller

When a program encounters an error it cannot recover from, the professional response is to send an escape message to its caller rather than just setting *INLR and returning. An escape message signals that the program failed — the caller can either handle it or let it propagate up the call stack.

dcl-pr SendEscapeMsg extpgm('QMHSNDPM');
  MsgID      char(7)   const;
  MsgFile    char(20)  const;
  MsgData    char(256) const;
  MsgDataLen int(10)   const;
  MsgType    char(10)  const;
  CallStkEnt char(10)  const;
  CallStkCnt int(10)   const;
  MsgKey     char(4);
  ErrorCode  char(256) options(*varsize);
end-pr;

dcl-s MsgKey  char(4);
dcl-s ErrCode char(256) inz(*loval);

// Send an escape message to the caller
SendEscapeMsg(
  'CPF9898'    :  // Generic escape message ID
  'QCPFMSG   QSYS      ' :
  'Order processing failed — see job log for details' :
  50 :
  '*ESCAPE   ' :
  '*PGMBDY   ' :
  1 :
  MsgKey :
  ErrCode
);

The caller — whether a CL program, another RPG program, or a menu — receives this escape message and can handle it with MONMSG (in CL) or a monitor block (in RPG). Without an escape message, the caller has no way to know the called program failed unless it checks a return parameter explicitly.

Building a reusable error logging procedure

Rather than scattering error handling logic through every program, centralise it in a service program procedure:

**FREE
// In a service program: ERRSRVPGM

ctl-opt nomain;

dcl-proc LogError export;
  dcl-pi *n;
    Program  varchar(10)  const;
    Routine  varchar(256) const;
    ErrCode  int(10)      const;
    ErrMsg   varchar(500) const;
  end-pi;

  exec sql
    INSERT INTO MYLIB.APPLOG
      (LogTime, LogProgram, LogRoutine, LogErrCode, LogMessage, LogUser, LogJob)
    VALUES(
      CURRENT_TIMESTAMP,
      :Program,
      :Routine,
      :ErrCode,
      :ErrMsg,
      USER,
      JOB_NAME
    );

end-proc;

Every program in your application calls LogError when something goes wrong. The log table accumulates all errors in one place, queryable by program, by time, by user, or by error code.

// Using it in any program
on-error *all;
  LogError('ORDERPGM' : %routine : %status : 'Failed to update order record');
  // send escape message and exit
endmon;

Lock wait handling — the practical pattern

Record lock conflicts (%status = 1218) deserve special attention because they are common in multi-user environments and require a retry strategy rather than an immediate failure:

dcl-s RetryCount int(5)  inz(0);
dcl-s MaxRetries int(5)  inz(3);
dcl-s LockWait   int(10) inz(500); // milliseconds

dow RetryCount <= MaxRetries;
  chain(e) CustomerID CUSTMAST CustRecord;

  if not %error;
    leave; // Got it — exit the retry loop
  endif;

  if %status = 1218 and RetryCount < MaxRetries;
    RetryCount += 1;
    // Wait before retrying
    dcl-pr usleep extproc('usleep');
      microseconds uns(10) value;
    end-pr;
    usleep(LockWait * 1000); // Convert ms to microseconds
  else;
    // Exceeded retries or different error
    LogError('MYPGM' : 'MainProc' : %status : 'Lock wait exceeded after ' + %char(RetryCount) + ' retries');
    *inlr = *on;
    return;
  endif;
enddo;

Three retries with a half-second wait between them handles the vast majority of transient lock conflicts without bothering the user.

What good error handling looks like end to end

Pulling it all together — a procedure that does it right:

dcl-proc ProcessOrder export;
  dcl-pi *n ind;
    OrderID packed(9:0) const;
  end-pi;

  dcl-s Success ind inz(*on);

  monitor;
    // Validate the order exists
    chain(e) OrderID ORDERPF OrderRec;
    if %error or not %found(ORDERPF);
      LogError('ORDERPGM' : 'ProcessOrder' : %status : 'Order ' + %char(OrderID) + ' not found');
      return *off;
    endif;

    // Update status
    OrderRec.Status = 'PROC';
    OrderRec.ProcessedTime = %timestamp();
    update(e) OrderRec;
    if %error;
      LogError('ORDERPGM' : 'ProcessOrder' : %status : 'Failed to update order ' + %char(OrderID));
      return *off;
    endif;

    exec sql COMMIT;

  on-error 1218;
    LogError('ORDERPGM' : 'ProcessOrder' : 1218 : 'Order ' + %char(OrderID) + ' locked');
    exec sql ROLLBACK;
    Success = *off;

  on-error *all;
    LogError('ORDERPGM' : 'ProcessOrder' : %status : 'Unexpected error processing order ' + %char(OrderID));
    exec sql ROLLBACK;
    Success = *off;
  endmon;

  return Success;
end-proc;

Every path through the procedure either succeeds and commits, or fails and rolls back. The caller gets a clear boolean result. The log table has a record of every failure. No silent data corruption, no cryptic system errors surfaced to users.

Next post: Debugging RPG programs — using the ILE debugger, source-level debugging, and reading dumps.