IBM i Performance Tuning in 2026: Memory Pools, Reading WRKACTJOB, Collection Services, and Diagnosing Slow Batch Jobs

The previous post covered Node.js on IBM i — building REST APIs that expose IBM i data and programs. This post covers performance: understanding how IBM i uses memory, how to read the tools that show you what the system is doing, and how to diagnose the causes of slow jobs and batch processes.

IBM i performance tuning is different from performance tuning on Linux or Windows because the resource model is different. IBM i uses a concept called memory pools to allocate main storage to groups of jobs. The storage model, the subsystem structure, and the way the system queues work all affect performance in ways that are specific to IBM i. Understanding the model is the prerequisite for reading the tools correctly.

Memory pools and auxiliary storage pools

IBM i divides main storage (RAM) into memory pools. Each pool has a fixed allocation of storage, and the jobs running in a subsystem draw storage from that subsystem’s pool. When a pool runs short of storage, the system pages jobs out to disk (auxiliary storage). Paging is the primary performance problem on IBM i — not CPU contention.

System-defined memory pools:

  • *MACHINE — licensed internal code and OS kernel. Cannot be modified.
  • *BASE — all jobs not assigned to a private pool. The default for most subsystems.
  • *INTERACT — interactive jobs. On most systems, given a dedicated pool to prevent batch jobs from starving interactive users.
  • *SPOOL — spooled file (print queue) processing.

Shared and private pools: Subsystems can use shared system pools (*BASE, *INTERACT) or have dedicated private pools. High-volume batch subsystems benefit from private pools to prevent resource contention.

-- Current pool allocation and usage
SELECT POOL_NAME, SUBSYSTEM_NAME, DEFINED_SIZE_KB,
       CURRENT_SIZE_KB, PAGING_FAULTS_PER_SEC
FROM   QSYS2.MEMORY_POOL_INFO
ORDER  BY CURRENT_SIZE_KB DESC;

Key system values affecting performance

QMCHPOOL — machine pool size. IBM recommends leaving at least 30% of total storage for the machine pool. Too small and the OS itself pages, causing system-wide degradation.

QBASPOOL — base pool size in kilobytes. The floor below which *BASE cannot drop.

QPFRADJ — performance adjustment. Set to 2 (enabled, adjust both pools and activity levels) on most production systems. Values:

  • 0 — no automatic adjustment
  • 1 — adjust only when operator requests it
  • 2 — continuous automatic adjustment (recommended)
  • 3 — adjust on IPL and when requested

QMAXACTLVL — maximum activity level for *BASE pool. The maximum number of threads that can be active simultaneously in *BASE. Setting this too low creates wait conditions; too high causes thrashing.

SELECT SYSTEM_VALUE_NAME, CURRENT_NUMERIC_VALUE, CURRENT_CHARACTER_VALUE
FROM   QSYS2.SYSTEM_VALUE_INFO
WHERE  SYSTEM_VALUE_NAME IN ('QPFRADJ','QMAXACTLVL','QBASACTLVL',
                              'QMCHPOOL','QBASPOOL','QTOTLAUX')
ORDER  BY SYSTEM_VALUE_NAME;

Reading WRKSYSSTS correctly

WRKSYSSTS (Work with System Status) is the primary screen for an IBM i system status snapshot. The key metrics:

WRKSYSSTS RESET(*YES)    /* Reset statistics counters for a clean baseline */

% CPU used — total CPU utilisation. On a well-tuned production system, this should stay below 70% at peak. Above 80% sustained means CPU is a constraint; add capacity or reduce workload.

DB fault/s, DB pages — database page faults per second. A fault occurs when the OS needs a database page that is not in main storage. High DB faults (above 20–30/s as a rough guide for an interactive system) indicate the database pool is undersized relative to the working set.

Non-DB fault/s — non-database page faults. High non-DB faults indicate the base or interactive pool is undersized.

Active to wait, Wait to Ineligible — transitions through the job state machine. A high “Active to Wait” rate means jobs are frequently waiting on I/O or other resources. “Wait to Ineligible” means the pool activity level is capping the number of active threads — increase the pool’s activity level or its storage allocation.

Reading WRKACTJOB

WRKACTJOB (Work with Active Jobs) shows all active jobs on the system. The columns that matter for performance diagnosis:

WRKACTJOB SBS(*ALL)     /* Show all subsystems */

Opt — select option 5 (Work with) on any job to drill into its details, including the job log.

Status — the most important column:

  • RUN — actively using CPU
  • DEQW — waiting on a data queue
  • EVTW — waiting on an event (semaphore, IPC)
  • SELW — waiting on a select() call (network I/O)
  • DSKW — waiting on disk I/O — investigate if many jobs show this
  • LCKW — waiting on an object lock — immediate investigation priority
  • THDW — thread wait
  • MSGW — waiting for operator reply to a message

Diagnosing a LCKW:

-- Find what object a job is waiting to lock, and who holds it
SELECT LOCK_NAME, LOCK_STATE, JOB_NAME, LOCK_SCOPE
FROM   QSYS2.OBJECT_LOCK_INFO
WHERE  OBJECT_NAME    = 'ORDHDR'
  AND  OBJECT_LIBRARY = 'ORDLIB'
  AND  OBJECT_TYPE    = '*FILE'
ORDER  BY JOB_NAME;

Collection Services

Collection Services is IBM i’s built-in performance data collector. It runs as a system service and collects hundreds of performance metrics at configurable intervals into a performance database.

Starting Collection Services:

STRPFRCOL    /* Start collection with default settings (interval: 5 minutes) */

/* Or specify a shorter interval for problem investigation */
STRPFRCOL INTERVAL(1) COLTYPE(*INTERVAL)

Viewing Collection Services data via SQL:

-- CPU utilisation over the last hour
SELECT INTSTTSP AS INTERVAL_START,
       CPUPCT   AS CPU_PCT,
       DSKRDS   AS DISK_READS,
       DSKWRTS  AS DISK_WRITES
FROM   QSYS2.SYSPERFCOL
WHERE  INTSTTSP > CURRENT_TIMESTAMP - 1 HOUR
ORDER  BY INTSTTSP;

IBM Navigator for i provides graphical views of Collection Services data — CPU, disk I/O, memory pool utilisation, and network throughput — over configurable time ranges. This is the tool to use for identifying whether a performance problem is resource constraint (CPU, memory, disk) or a workload pattern issue.

Diagnosing slow batch jobs

Step 1 — Check if the job is actually running: In WRKACTJOB, is the job in RUN status? If it is in DSKW constantly, the disk subsystem is a bottleneck. If it is in LCKW, another job holds a lock it needs.

Step 2 — Check the job log: Press F10 on the job in WRKACTJOB, or use DSPJOBLOG. Look for CPF messages, SQL messages (SQ prefix), or timing messages. In particular, watch for:

  • CPI2777 — DB2 is performing a table scan instead of using an index
  • CPI4322 — temporary index was built for a query (index Advisor is recommending one)

Step 3 — Check SQL performance with the Plan Cache:

-- Find the slowest SQL statements run by a specific job in the last hour
SELECT QUERY_TEXT,
       AVG_TIME_PER_RUN / 1000 AS AVG_SECONDS,
       RUN_COUNT,
       ROWS_PROCESSED
FROM   QSYS2.SYSQRYSLT
WHERE  QUERY_CREATOR = 'BATCHJOB'
ORDER  BY AVG_TIME_PER_RUN DESC
FETCH FIRST 10 ROWS ONLY;

Step 4 — Index Advisor:

-- Check if DB2 is recommending indexes for your queries
SELECT SYSTEM_TABLE_NAME, MTI_USED, MTI_CREATED, TIMES_ADVISED,
       ADVISED_INDEX_KEYS
FROM   QSYS2.SYSIXADV
WHERE  SYSTEM_TABLE_SCHEMA = 'ORDLIB'
ORDER  BY TIMES_ADVISED DESC
FETCH FIRST 20 ROWS ONLY;

If the Index Advisor shows a frequently advised index that does not exist — create it. This is the single most impactful performance action for SQL-based batch jobs.

Disk I/O and auxiliary storage pools

IBM i organises disk storage into Auxiliary Storage Pools (ASPs). The system ASP (ASP 1) holds the OS and most libraries. User ASPs (ASPs 2–32) and Independent ASPs (iASPs, ASPs 33–255) can hold application data.

Separating high-I/O database files to a dedicated ASP (with its own disk drives) reduces contention with system paging and OS activity. This is relevant for large IBM i installations with significant concurrent batch I/O.

-- Check ASP utilisation
SELECT ASP_NUMBER, ASP_TYPE, TOTAL_CAPACITY_GB,
       USED_CAPACITY_GB,
       ROUND(USED_CAPACITY_GB / TOTAL_CAPACITY_GB * 100, 1) AS PCT_USED
FROM   QSYS2.ASP_INFO
ORDER  BY ASP_NUMBER;

Common performance anti-patterns

Full table scans in batch loops: An RPG program that loops over all records in ORDHDR and does a CHAIN into ORDLIN for each — with no index on the join key — performs one random I/O per record. On a million-record file, this is the difference between a 2-minute batch job and a 4-hour one. Move the join to SQL.

Open Data Path (ODP) reuse: When an RPG program opens the same file repeatedly in a loop, each open allocates an ODP. ODP reuse (OVRDBF OVRSCOPE(*JOB)) or restructuring to open once outside the loop avoids repeated ODP creation.

Commitment control with journalling overhead: Running SQL updates inside a commitment control boundary with COMMIT(*CHG) journals every change. For large batch updates where rollback is not needed, COMMIT(*NONE) eliminates journalling overhead. Use deliberately — this trades recovery capability for performance.

Interactive jobs doing heavy batch work: Interactive jobs run in the interactive pool, which is sized for response time, not throughput. Submit heavy processing with SBMJOB to a batch subsystem with appropriate pool sizing.

IBM i performs extraordinarily well when jobs run in correctly sized pools, when SQL uses indexes, and when batch workloads are designed for throughput rather than adapted from interactive patterns. The diagnostic path is always the same: WRKSYSSTS for resource constraints, WRKACTJOB for job-level state, the plan cache and Index Advisor for SQL, and Collection Services for historical trend analysis.

Next post: Advanced DB2 for i SQL — Materialized Query Tables, the SQE versus CQE split, index statistics, and the SQL patterns that matter most for query performance on IBM i.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top