Bicster's Blog: IBM Curam Batch Framework 101

The IBM Cúram Batch Framework enables batch processing functionality to be written and executed from within IBM Cúram.

Batch processing functionality is implemented in individual Batch Processes. A Batch Process represents a single job such as DetermineProductDeliveryEligibility or GenerateInstructionLineItems.

There are two types of Batch Process – Single Threaded and Streamed.
- Single Threaded Batch Processes typically process simple tasks or small workloads. All processing is done in a single transaction by a single process and a failure typically causes the whole batch to fail and the transaction to roll back.
- Streamed Batch Processes use the Chunking and Streaming features of the IBM Cúram Batch framework to enable parallel processing. Streamed Batch Processes have two components;

- Chunker : The Chunker has business logic to determine what work needs to be performed and divides said work up into Chunks. Once all Chunks are processed the Chunker will typically output a report to indicate what was processed and whether any failures occurred.

- Streams[1] : Streams are specific to a particular Chunker. The Streams have business logic to process the Chunks created by the Chunker. Streams process each Chunk in a single transaction. The number of Streams to use per Batch Process depends on many factors including amount of work to process (aka number of Chunks), the type of work being processed and available system resources.

A Chunk is a single unit of work to be processed. Chunks contain the record or records to be processed in a single transaction. The Chunker creates the Chunks based upon a Chunk Size configuration (which is configurable per batch). The Chunk Size governs how many items are contained within each Chunk.

Example : A Batch Process which reassesses cases has a Chunk Size of 5, this means that each Chunk contains 5 case IDs representing the 5 cases to be reassessed as part of the Chunk. Those 5 cases will be reassessed in the same transaction when the Stream processes that Chunk.

If an exception occurs during processing a Single Threaded Batch Process the batch fails and the transaction will roll back[2]. If an exception occurs during processing a Chunk the Stream is smart enough to;

1. Mark the record as throwing an error

2. Roll the transaction back

3. Restart processing the Chunk, this time skipping the record which threw the error

The above flow repeats itself until the entire Chunk is processed.

Example: A Stream is processing a Chunk with a size of 5. It starts a transaction, processes record 1 and 2 and then record 3 throws an exception. At this point the Stream rolls the transaction back and starts processing again. This time it processes record 1, 2, skips 3 and attempts 4. If 4 were to throw an exception the Stream would roll the transaction back for the second time and restart processing record 1, 2, skipping 3 and 4 and then processing 5. When all records are processed the Stream moves onto the next Chunk. It continues doing this until all Chunks are processed.

Streams read Chunks from the BatchProcessChunk table which contains the chunks to be processed, whether they’ve been processed and the instanceID of the batch they relate to. InstanceIDs are used to tie Streams to the work created by their associated Chunker. The instanceID is what Streams use to know which Chunks to process (as it is possible to execute multiple Streamed batches of different types in parallel).

The BatchProcessChunk table can be used to determine how much work is remaining and how quickly the Chunks are being processed.

The following SQL can be used to determine how many Chunks have been processed;

select count(*), instanceid, status from batchprocesschunk group by instanceid, status;

BatchProcessChunk records with a status of BPCS1 have not been processed. BatchProcessChunk records with a status of BPCS2 have been processed. Adding the total number of BPCS1 and BPCS2 records will give you the total number of Chunks for this batch.

The following SQL can be used to determine the pace of a job (aka number of chunks which are being processed per minute);

SELECT Trunc(lastwritten) AS LastWritten,
       To_char(lastwritten, 'HH24')
       || ':'
       || To_char(lastwritten, 'MI') AS TIME,
       status,
       Count(*)
FROM   batchprocesschunk
WHERE instanceid = '<INSTANCE_ID>'
       AND status = 'BPCS2'
GROUP BY Trunc(lastwritten),
          To_char(lastwritten, 'HH24'),
          To_char(lastwritten, 'MI'),
          status
ORDER BY 1 DESC,
          2 DESC,
          3 DESC;

Replace <INSTANCE_ID> with the instanceID of the Batch Process of interest before executing this SQL.

Streams initially go into a waiting state when launched. In this state the Stream polls the BatchProcess table looking for records corresponding to its instanceID. The Stream will poll indefinitely until it finds work to process. When the Stream finds a record in the BatchProcess table corresponding to its instanceID it will transition to a processing state. In this state the Stream will process Chunks until there are no more Chunks to process. At that point the Stream will terminate.

Example: A Chunker is launched with 5 Streams. Streams launches are staggered by 5 seconds to avoid database contention when bootstrapping. By the time the 5^th Stream is launched and is ready to process the other 4 streams have already processed the Chunks and the Chunker has cleaned everything up. In this scenario the 5^th Stream will wait indefinitely and will need to be manually terminated, unless another run of the batch is planned.

Points to note about Streams;

- If a Stream is started and the Chunker has not finished chunking the Stream will wait

- If the Stream doesn’t find any Chunks when launched it will wait indefinitely

- Streams will continue processing Chunks until there are no more Chunks to process

- When there are no more Chunks to process Streams will terminate themselves

- Streams do not require the Chunker to run, so will continue processing if the Chunker is terminated mid-way through a job

- If a Stream is terminated while processing, the current Chunk it is processing will be rolled back and will not be processed

- If a Stream is terminated and then restarted prior to the Chunks being processed it will continue processing, starting with the next available Chunk

- Streams cache information from the database when launched so when launching multiple streams it is best to stagger them to avoid database contention

The Chunker monitors (polls) Chunk status during execution. When all Chunks have been processed it will perform any post-processing required by business rules and then clean up the BatchProcessChunk and associated tables. The Chunker may also output a report detailing the number of records processed, failures and any skipped Chunks. The quality of this report is largely dependent on the developer of the Batch Process.

If the Chunker is terminated while Streams are processing Chunks no cleanup will be performed. In this scenario if the Chunker is restarted and no manual cleanup has occurred, the Chunker will continue to monitor the progress of processed Chunks as if nothing has happened. If the Chunker detects Chunks of its instanceID when launched it will not truncate the batch control tables and re-chunk.

In order to restart a Streamed Batch Process from the start after a mid-processing termination, the batch control tables need to be purged using the following SQL;

DELETE FROM batchprocesschunk WHERE instanceid = '<INSTANCE_ID>';
DELETE FROM batchchunkkey WHERE instanceid = '<INSTANCE_ID>';
DELETE FROM batchprocess WHERE instanceid = '<INSTANCE_ID>';
COMMIT;

Replace <INSTANCE_ID> of the Batch Process of interest before executing this SQL.

DB-to-JMS is a feature of the Cúram Batch Framework which allows batch processes access to Cúram JMS queues. DB-to-JMS works by intercepting messages sent to the Cúram JMS messaging queues and storing them on a database table (JMSLiteMessage). At the end of batch processing the Batch Launcher will trigger a call to the DB-to-JMS servlet (running in an Application Server) which will initiate a deferred process to transfer messages stored in the JMSLiteMessage table to their JMS queue.

If the configured Application Server is not accessible when the Batch Launcher attempts to call the DB-to-JMS servlet an exception will occur and the batch process will appear to fail. This failure can be misleading as the DB-to-JMS call is made in a separate transaction from the batch processing, so in this instance the batch processing has actually succeeded. The entries in the JMSLiteMessage table for this batch will be processed the next time a call to the DB-to-JMS servlet is successfully made.

DB-to-JMS functionality is available for Single Threaded and Streamed batch processes. More information on DB-to-JMS functionality and how to enable and configure it can be found in the IBM Cúram Documentation Centre.

[1] Chunkers can be configured to run as Streams, although we tend to disable this ability to avoid confusion.

[2] This is typically by design as catching and handling exceptions could lead to inconsistent data.

2 comments:

Unknown6 July 2019 at 10:33
Nice one Bicster - you helped me realize I forgot to set my curam.batchlauncher.dbtojms.enabled property to "true". I'm very thankful to such knowledgable Curam experts like yourself. Keep up the good work!
Naresh13 September 2022 at 05:53
Good post, Can You let me know more about evidence and rule

Wednesday, 28 March 2018

IBM Curam Batch Framework 101

2 comments: