Debugging RPG programs: the ILE debugger, source-level debugging, and reading dumps

Every RPG developer eventually writes a program that does something it should not. The calculation is wrong. The loop runs one too many times. The record that should be updated is not. The program crashes with a status code that tells you almost nothing about why.

Debugging is the skill that separates developers who fix problems efficiently from those who spend hours adding DSPLY statements and recompiling. IBM i has a capable source-level debugger built into the system, and once you know how to use it, you will find most bugs in minutes rather than hours.

This post covers the ILE debugger, how to set it up, the commands you need, and what to do when you cannot use an interactive debugger at all.

Before you can debug: compile with debug data

The debugger needs symbol information — the mapping between your source code lines and the compiled instructions. Without it, you can only see machine-level instructions, not your source.

Compile with debug data using the DBGVIEW parameter:

CRTBNDRPG PGM(MYLIB/MYPGM) SRCFILE(MYLIB/QRPGLESRC) +
           DBGVIEW(*SOURCE)

CRTSQLRPGI OBJ(MYLIB/MYPGM) SRCFILE(MYLIB/QRPGLESRC) +
            DBGVIEW(*SOURCE) COMMIT(*NONE)

DBGVIEW(*SOURCE) embeds the source view into the compiled object. Options:

*SOURCE — use the source member at debug time. The source must still exist where it was when compiled.
*LIST — embed the compiler listing. Works even if the source is moved or deleted.
*ALL — embed both. Largest object, most flexible.
*NONE — no debug data. Smallest object, cannot debug source-level.

For production objects, many shops compile with *NONE to keep object sizes down, then recompile with *SOURCE when they need to debug a specific problem. For development, always use *SOURCE or *LIST.

Also compile with optimisation off for debugging — optimisation can rearrange code in ways that make the debugger jump around unexpectedly:

CRTBNDRPG PGM(MYLIB/MYPGM) SRCFILE(MYLIB/QRPGLESRC) +
           DBGVIEW(*SOURCE) OPTIMIZE(*NONE)

Starting the ILE debugger

The ILE debugger attaches to a running job. There are two modes:

Debug your own interactive job — start the debugger, then call the program you want to debug:

STRDBG PGM(MYLIB/MYPGM)
CALL MYPGM

STRDBG puts your interactive session into debug mode. When MYPGM starts, execution stops at the first breakpoint (or at the first statement if you set one).

Debug a batch job — attach the debugger to a running batch job from a separate interactive session:

STRSRVJOB JOB(123456/MYUSER/MYJOB)
STRDBG PGM(MYLIB/MYPGM)

STRSRVJOB connects your interactive session to the batch job. The batch job pauses at breakpoints and you control it from your interactive session. This is how you debug jobs that only misbehave in batch, or scheduled jobs you cannot run interactively.

The debug display

Once in debug mode with a breakpoint hit, you see the source-level debug display — your source code with a cursor on the current line, and a command line at the bottom.

Key function keys:

F6 — add a breakpoint at the cursor line
F10 — step (execute one statement and stop)
F12 — resume (run until the next breakpoint)
F11 — step into a called procedure
F3 — end the debug session
F14 — work with module list (switch between source files/modules)
F17 — work with breakpoints (see all set breakpoints)

Breakpoints

A breakpoint pauses execution at a specific source line. When execution reaches a breakpoint, the debugger takes control and shows you the source at that point.

From the debug display: move the cursor to the line you want and press F6.

From the command line:

BREAK 150           -- Break at line 150 of the current module
BREAK 150 WHEN CustomerID = '000123'  -- Conditional breakpoint

Conditional breakpoints are one of the most powerful features. Instead of stopping at a line every time it is executed — which could be thousands of times in a loop — you stop only when a specific condition is true. This is how you find the one iteration out of ten thousand where the logic goes wrong.

-- Break only when the calculated total does not match expected
BREAK 287 WHEN CalcTotal  ExpectedTotal

-- Break when a specific record is being processed
BREAK 156 WHEN OrderID = 98765

-- Break when an error indicator is set
BREAK 201 WHEN *IN99 = *ON

Remove a breakpoint:

CLEAR 150           -- Remove breakpoint at line 150
CLEAR *ALL          -- Remove all breakpoints

Displaying and changing variables

While paused at a breakpoint, you can inspect and change any variable in scope.

Display a variable:

EVAL CustomerID
EVAL OrderTotal
EVAL CustRecord       -- Displays all fields in a data structure
EVAL CustRecord.Name  -- Display one field

Display an array:

EVAL MonthlyTotals          -- Shows all elements
EVAL MonthlyTotals(3)       -- Shows element 3
EVAL MonthlyTotals(1..12)   -- Shows elements 1 through 12

Change a variable to test different conditions:

EVAL OrderStatus = 'HOLD'    -- Change a character field
EVAL OrderTotal = 999.99     -- Change a numeric field
EVAL *IN50 = *ON             -- Set an indicator

Changing variables mid-execution is invaluable for testing edge cases. Found the bug but want to verify your fix without recompiling? Change the variable to the correct value and let the program continue. If it now works correctly, you have confirmed the fix.

Watch breakpoints — stop when a variable changes

A watch breakpoint fires when a variable’s value changes, regardless of which line of code changed it. This is how you find where an unexpected value is being assigned.

WATCH OrderStatus              -- Break whenever OrderStatus changes
WATCH OrderTotal WHEN OrderTotal > 100000  -- Break when total exceeds a threshold

Watch breakpoints are the answer to “something is setting this variable to the wrong value and I cannot find where.” Set the watch and let the program run — it will stop exactly where the change happens.

Step through code

F10 steps one statement at a time, executing the current line and stopping at the next. This is how you trace through logic when you suspect the bug is in a specific section but you are not sure exactly where.

F11 steps into called procedures and service program calls — if the current line is a procedure call, F11 takes you inside that procedure rather than stepping over it.

F12 (or the GO command) resumes execution until the next breakpoint.

GO              -- Resume until next breakpoint
STEP            -- Step one statement (same as F10)
STEP INTO       -- Step into a call (same as F11)
STEP OVER       -- Step over a call, stopping at the next line in current scope

Debugging service programs

When the bug is inside a service program procedure rather than the main program, add the service program to the debug session:

STRDBG PGM(MYLIB/MYPGM) SRVPGM(MYLIB/MYSRVPGM MYLIB/ERRSRVPGM)

Or add it after the session has started:

ADDBKP SRVPGM(MYLIB/MYSRVPGM)

Once added, you can set breakpoints in the service program source and step into its procedures from the calling program. Switch between modules with F14 to navigate to the service program source.

Reading a formatted dump

When a program crashes in production and you did not catch it with error handling, the system can produce a formatted dump — a snapshot of the program’s state at the moment of failure.

Request a dump in the *PSSR or monitor block:

on-error *all;
  // Dump the program state to a spooled file
  dump(a);
  LogError('MYPGM' : %routine : %status : 'Unhandled error — dump produced');
endmon;

DUMP(A) produces a formatted dump spooled file. Find it in the output queue for the job:

WRKSPLF SELECT(*CURRENT)

The dump shows every variable in the program at the time of the crash — all fields, all data structures, all indicators — with their current values. It is a complete snapshot of program memory.

What to look for in the dump:

The PSR (Program Status data structure) — shows the error status code, the program name, the statement number where the error occurred, and the last file operation performed
Variable values at crash time — find the variable that should have a valid value but is blank, zero, or corrupt
Indicators — check which indicators were on or off at the time of the crash
The file status area — for file-related crashes, shows the last operation, the key used, and the file status code

The Program Status Data Structure (PSDS)

For diagnosing crashes without an interactive debug session, the PSDS gives you information about the program’s state that you can access within the program itself:

**FREE
dcl-ds ProgramStatus psds qualified;
  PgmName    char(10)  pos(1);
  StatusCode zoned(5)  pos(11);
  PrevStatus zoned(5)  pos(16);
  Routine    char(8)   pos(86);
  Parms      zoned(3)  pos(112);
  ExcpType   char(3)   pos(40);
  ExcpNum    char(4)   pos(43);
  StmtNum    char(8)   pos(21);
end-ds;

// In your *PSSR or on-error handler:
begsr *PSSR;
  ErrInfo = 'Program: '  + %trimr(ProgramStatus.PgmName)  +
            ' Routine: ' + %trimr(ProgramStatus.Routine)  +
            ' Status: '  + %char(ProgramStatus.StatusCode) +
            ' Statement: '+ %trimr(ProgramStatus.StmtNum);

  exec sql
    INSERT INTO MYLIB.CRASHLOG
      (LogTime, ErrDetail)
    VALUES(CURRENT_TIMESTAMP, :ErrInfo);

  *inlr = *on;
  return;
endsr;

The PSDS is available in every RPG program automatically — you just need to declare it. StatusCode is the error that caused the crash. StmtNum is the source statement number. Routine is the procedure or subroutine name. Together, these three fields tell you exactly where and why the program failed, without needing to read a dump or attach a debugger.

Debugging embedded SQL errors

SQL errors do not set %ERROR or call *PSSR — they set SQLCODE and SQLSTATE. For diagnostics, the SQLCA (SQL Communication Area) data structure gives you more detail:

dcl-ds SQLCA qualified;
  SQLCode    int(10) pos(1) overlay(SQLCA:117);
  SQLState   char(5) pos(1) overlay(SQLCA:119);
  SQLErrml   int(5)  pos(1) overlay(SQLCA:138);
  SQLErrmc   char(70) pos(1) overlay(SQLCA:140);
  SQLErrp    char(8)  pos(1) overlay(SQLCA:210);
end-ds;

For simpler diagnostics, check SQLCODE and use the GET DIAGNOSTICS statement for the full error message:

dcl-s SQLErrText varchar(500);

exec sql
  UPDATE MYLIB.ORDERPF SET Status = 'PROC' WHERE OrderID = :OrderID;

if SQLCODE < 0;
  exec sql GET DIAGNOSTICS CONDITION 1
    :SQLErrText = MESSAGE_TEXT;

  LogError('MYPGM' : 'UpdateOrder' : SQLCODE : SQLErrText);
endif;

GET DIAGNOSTICS returns the full human-readable error message from DB2 — far more useful than the raw SQLCODE number when diagnosing an unexpected SQL failure.

When you cannot use the debugger

Sometimes you cannot attach the debugger — the problem only happens in production, or it only appears under specific load conditions you cannot reproduce interactively. In those cases:

Trace logging — add temporary log writes at key points to trace program flow:

exec sql INSERT INTO MYLIB.TRACELOG VALUES(CURRENT_TIMESTAMP, 'MYPGM', 'Before chain', :OrderID);

DUMP(A) in production — add it to your error handler as shown above. The dump appears in the job’s output queue after the crash.

Job log analysis — enable verbose logging for the job and query the job log with SQL after the failure:

SELECT MESSAGE_ID, MESSAGE_TEXT, MESSAGE_TIMESTAMP, FROM_PROGRAM
  FROM TABLE(QSYS2.JOBLOG_INFO('123456/MYUSER/MYJOB')) AS X
  WHERE MESSAGE_TYPE IN ('*ESCAPE', '*DIAGNOSTIC')
  ORDER BY MESSAGE_TIMESTAMP DESC

QSYS2.PROGRAM_INFO — check what was compiled into the object if you are not sure whether the right version is deployed:

SELECT PROGRAM_NAME, PROGRAM_LIBRARY,
       SOURCE_FILE, SOURCE_MEMBER,
       SOURCE_LAST_CHANGE_DATE,
       CREATION_TIMESTAMP
  FROM QSYS2.PROGRAM_INFO
  WHERE PROGRAM_NAME = 'MYPGM'
    AND PROGRAM_LIBRARY = 'MYLIB'

“It worked yesterday” problems often turn out to be version problems — the wrong version of the program is deployed, or the source was changed after the object was compiled.

Next post: Source control for IBM i — managing your RPG and CL source with Git and RDi.