Most RPG programs are written to handle the happy path. The file exists, the record is found, the calculation succeeds, the API responds. When something goes wrong, many programs either crash with a cryptic system message, silently continue with bad data, or dump the user into a raw system error screen that tells them nothing useful.
Good error handling is not defensive paranoia — it is professionalism. A program that fails gracefully tells the user what went wrong in plain language, logs enough detail for a developer to diagnose the problem, cleans up any resources it was holding, and exits cleanly. This post covers the techniques to make that happen in RPGLE.
The two error handling models in RPG
RPGLE has two distinct approaches to error handling that you will use in different contexts:
The *PSSR subroutine and INFSR — the traditional RPG error handling model. A special subroutine that the runtime calls automatically when an unhandled error occurs. Still valid and still used, especially in older codebases.
The %ERROR built-in and ERRCOD parameter — operation-level error trapping. You tell a specific operation to not crash on error, then check %ERROR afterwards. Fine-grained control without needing a global error handler.
In modern free-format RPG, you will typically use a combination: %ERROR for operations where you expect errors might occur, and *PSSR as a safety net for anything unexpected.
%ERROR and the ERRCOD parameter
For file operations and some built-ins, adding (e) to the operation code tells RPG to not crash if the operation fails — set %ERROR instead:
**FREE // Read with error handling read(e) CUSTMAST CustRecord; if %error; // Read failed — handle it dsply 'Error reading customer file'; *inlr = *on; return; endif; if %eof(CUSTMAST); // End of file — normal condition, not an error leave; endif;
The (e) extender works on most file operations: read(e), write(e), update(e), delete(e), chain(e), open(e), close(e).
For operations that do not support (e), use the ERRCOD data structure parameter where available, or wrap the call in a monitor block (see below).
%STATUS — what went wrong
After %ERROR is set, %STATUS tells you what the error was. It returns a numeric code:
- 1211 — tried to read a file that is not open
- 1218 — record locked by another job
- 1221 — duplicate key on a write
- 1222 — record not found on a CHAIN
- 1251 — end of file on a read
- 1255 — file is full
chain(e) CustomerID CUSTMAST CustRecord;
if %error;
select;
when %status = 1218;
// Record locked — another job has it
dsply 'Customer record is in use. Please try again.';
when %status = 1222;
// Not found — CHAIN sets %found to *off too
dsply 'Customer not found: ' + %char(CustomerID);
other;
// Unexpected error
dsply 'Unexpected error: ' + %char(%status);
endsl;
*inlr = *on;
return;
endif;Monitor blocks — structured exception handling
The monitor block is the modern equivalent of the old *PSSR subroutine for handling errors in a specific section of code. It works like try/catch in other languages.
monitor;
// Code that might fail
chain CustomerID CUSTMAST CustRecord;
if not %found(CUSTMAST);
dsply 'Customer not found';
return;
endif;
update CustRecord;
on-error 1218;
// Specifically handle record lock
dsply 'Record locked by another user';
on-error 1221;
// Duplicate key
dsply 'Duplicate record — customer already exists';
on-error *FILE;
// Any other file error
dsply 'File error: ' + %char(%status);
on-error *ALL;
// Catch-all for anything not caught above
dsply 'Unexpected error occurred: ' + %char(%status);
// Log, clean up, and exit
endmon;on-error clauses match by status code, by error class (*FILE, *PROGRAM, *CANCEL), or *ALL to catch everything. The first matching clause runs — just like a select/when structure.
Monitor blocks can be nested. The inner block handles what it can; anything it does not handle bubbles up to the outer block.
The *PSSR subroutine — the safety net
*PSSR is a special subroutine that RPG calls automatically when an unhandled program error occurs — anything not caught by a monitor block or (e) extender. It is your last line of defence.
**FREE
ctl-opt dftactgrp(*no) actgrp(*caller);
// ... main program logic ...
// The safety net — runs on any unhandled error
begsr *PSSR;
// %status is set to the error code
// %routine gives the name of the routine where the error occurred
ErrMsg = 'Program error in ' + %routine +
': status ' + %char(%status);
// Log to a file or send a message
exec sql
INSERT INTO MYLIB.ERRORLOG
(LogTime, Program, Routine, ErrStatus, ErrMessage)
VALUES(CURRENT_TIMESTAMP,
'MYPGM',
:ErrRoutine,
:ErrStatus,
:ErrMsg);
// Send an escape message to the caller
// (this ends the program and tells the caller something went wrong)
*inlr = *on;
return;
endsr;%routine returns the name of the procedure or subroutine where the error occurred — invaluable for diagnosing which part of a large program failed.
INFSR — file-specific error handling
Each file in an RPG program can have its own error handling subroutine, defined with the INFSR keyword on the file declaration:
dcl-f CUSTMAST usage(*update) keyed infsr(CustFileErr);
dcl-f ERRORLOG usage(*output);
// ... main logic ...
begsr CustFileErr;
// Called automatically when CUSTMAST has an error
// %status is set to the file status code
if %status = 1218;
// Lock wait — set a flag and return to the operation
RecordLocked = *on;
return;
endif;
// For anything else, log and end
ErrMsg = 'CUSTMAST error: ' + %char(%status);
// log it...
*inlr = *on;
return;
endsr;INFSR gives you file-level granularity — useful when different files in the same program need different error responses.
Sending messages to the caller
When a program encounters an error it cannot recover from, the professional response is to send an escape message to its caller rather than just setting *INLR and returning. An escape message signals that the program failed — the caller can either handle it or let it propagate up the call stack.
dcl-pr SendEscapeMsg extpgm('QMHSNDPM');
MsgID char(7) const;
MsgFile char(20) const;
MsgData char(256) const;
MsgDataLen int(10) const;
MsgType char(10) const;
CallStkEnt char(10) const;
CallStkCnt int(10) const;
MsgKey char(4);
ErrorCode char(256) options(*varsize);
end-pr;
dcl-s MsgKey char(4);
dcl-s ErrCode char(256) inz(*loval);
// Send an escape message to the caller
SendEscapeMsg(
'CPF9898' : // Generic escape message ID
'QCPFMSG QSYS ' :
'Order processing failed — see job log for details' :
50 :
'*ESCAPE ' :
'*PGMBDY ' :
1 :
MsgKey :
ErrCode
);The caller — whether a CL program, another RPG program, or a menu — receives this escape message and can handle it with MONMSG (in CL) or a monitor block (in RPG). Without an escape message, the caller has no way to know the called program failed unless it checks a return parameter explicitly.
Building a reusable error logging procedure
Rather than scattering error handling logic through every program, centralise it in a service program procedure:
**FREE
// In a service program: ERRSRVPGM
ctl-opt nomain;
dcl-proc LogError export;
dcl-pi *n;
Program varchar(10) const;
Routine varchar(256) const;
ErrCode int(10) const;
ErrMsg varchar(500) const;
end-pi;
exec sql
INSERT INTO MYLIB.APPLOG
(LogTime, LogProgram, LogRoutine, LogErrCode, LogMessage, LogUser, LogJob)
VALUES(
CURRENT_TIMESTAMP,
:Program,
:Routine,
:ErrCode,
:ErrMsg,
USER,
JOB_NAME
);
end-proc;Every program in your application calls LogError when something goes wrong. The log table accumulates all errors in one place, queryable by program, by time, by user, or by error code.
// Using it in any program
on-error *all;
LogError('ORDERPGM' : %routine : %status : 'Failed to update order record');
// send escape message and exit
endmon;Lock wait handling — the practical pattern
Record lock conflicts (%status = 1218) deserve special attention because they are common in multi-user environments and require a retry strategy rather than an immediate failure:
dcl-s RetryCount int(5) inz(0);
dcl-s MaxRetries int(5) inz(3);
dcl-s LockWait int(10) inz(500); // milliseconds
dow RetryCount <= MaxRetries;
chain(e) CustomerID CUSTMAST CustRecord;
if not %error;
leave; // Got it — exit the retry loop
endif;
if %status = 1218 and RetryCount < MaxRetries;
RetryCount += 1;
// Wait before retrying
dcl-pr usleep extproc('usleep');
microseconds uns(10) value;
end-pr;
usleep(LockWait * 1000); // Convert ms to microseconds
else;
// Exceeded retries or different error
LogError('MYPGM' : 'MainProc' : %status : 'Lock wait exceeded after ' + %char(RetryCount) + ' retries');
*inlr = *on;
return;
endif;
enddo;Three retries with a half-second wait between them handles the vast majority of transient lock conflicts without bothering the user.
What good error handling looks like end to end
Pulling it all together — a procedure that does it right:
dcl-proc ProcessOrder export;
dcl-pi *n ind;
OrderID packed(9:0) const;
end-pi;
dcl-s Success ind inz(*on);
monitor;
// Validate the order exists
chain(e) OrderID ORDERPF OrderRec;
if %error or not %found(ORDERPF);
LogError('ORDERPGM' : 'ProcessOrder' : %status : 'Order ' + %char(OrderID) + ' not found');
return *off;
endif;
// Update status
OrderRec.Status = 'PROC';
OrderRec.ProcessedTime = %timestamp();
update(e) OrderRec;
if %error;
LogError('ORDERPGM' : 'ProcessOrder' : %status : 'Failed to update order ' + %char(OrderID));
return *off;
endif;
exec sql COMMIT;
on-error 1218;
LogError('ORDERPGM' : 'ProcessOrder' : 1218 : 'Order ' + %char(OrderID) + ' locked');
exec sql ROLLBACK;
Success = *off;
on-error *all;
LogError('ORDERPGM' : 'ProcessOrder' : %status : 'Unexpected error processing order ' + %char(OrderID));
exec sql ROLLBACK;
Success = *off;
endmon;
return Success;
end-proc;Every path through the procedure either succeeds and commits, or fails and rolls back. The caller gets a clear boolean result. The log table has a record of every failure. No silent data corruption, no cryptic system errors surfaced to users.
Next post: Debugging RPG programs — using the ILE debugger, source-level debugging, and reading dumps.