IBM BPM, V8.0.1, All platforms > Authoring services in Integration Designer > Services and service-related functions > Access external services with adapters > Configure and using adapters > IBM WebSphere Adapters > Flat Files > Overview of WebSphere Adapter for Flat Files > Technical overview > Inbound processing

File splitting

The adapter supports an optional file splitting feature to reduce memory loading during the event processing. When this feature is used, the adapter divides large event files into smaller chunks, which are then posted separately to the endpoint.

The adapter splits large event files into several business objects, also called chunks, based on the value you specify in the SplitCriteria property, which can be either a delimiter or a chunk size. Each business object is delivered to the endpoint separately. You can split files using a delimiter when the content of the business object has a definite structure; for example, if you have a customer business object with elements such as name, address, and city. You can also split files by size when the business object contains unstructured data, such as plain text or binary files.

When event files are split into such chunks, each chunk creates a business object. This means that the value specified for the PollQuantity property and the number of business objects delivered to the endpoint can be different. When file splitting based on a delimiter is enabled, the PollQuantity activation specification property specifies the number of such event files that are present in the event store, and the class used to split the event file is set in the SplittingFunctionClassName activation specification property.

The adapter does not reassemble the chunked data.

The value specified in the SplitCriteria property determines the method that is used. The default value for SplitCriteria property is zero, which means that no splitting is performed. You can also leave the values of the SplitCriteria and SplittingFunctionClassName properties empty, if no splitting is required.

You can optionally provide a custom file splitter class. Set the SplittingFunctionClassName property to the name of the class.


File splitting by delimiter

When one or more characters such as a comma (,), semicolon (;), quote (",'), brace ({}) or slash (/ \) delimiters are used to separate the business objects in a file, the adapter can split the file into smaller chunks based on the delimiter. Each chunk is a logical unit used to construct a business object when forwarded to IBM BPM or WebSphere Enterprise Service Bus.

You define the delimiter that separates the business objects in the file in the SplitCriteria property.

To demonstrate how the PollQuantity value works with delimiter file splitting, consider two event files. The first event file contains a business object and the second event file contains two business objects. If the PollQuantity value is 2, the first business object from the first event file and the next business record from the second event file are sent in the first poll cycle. The second business object from the second file is sent in the second poll cycle.

The following rules apply to the use of delimiters:

An example of a scenario with the commonly used delimiter format is shown in Table 2.

Example of a scenario with a delimiter format
Data binding BO content Recommended delimiter format
XML
<?xml version="1.0" encoding="UTF-8"?>
<customer:Customer xsi:type="customer:Customer" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:customer="http://www.ibm.com/xmlns/prod/websphere/
j2ca/flatfile/customer">
<CustomerName>Deepa</CustomerName>
<Address>IBM</Address>
<City>Bangalore</City>
<State>KA</State>
</customer:Customer>
##
##;\n


File splitting by size

The value specified in the SplittingFunctionClassName property determines whether a file is split by size. If the SplittingFunctionClassName property is set to com.ibm.j2ca.utils.filesplit.SplitBySize, the SplitCriteria property must contain a valid number that represents the maximum file size, in bytes. If the event file is larger than the value specified in the SplitCriteria property, the file is split into chunks and each chunk is posted to the endpoint separately. If the event file is smaller than the SplitCriteria value, the entire event file is posted to the endpoint.

When event files are split into chunks, each chunk becomes a business object. This means that the value specified for the PollQuantity property and the number of business objects delivered to the endpoint can be different. Although the adapter polls according to the PollQuantity value, it actually processes the business objects in the file one at a time.

For example, if an event file is chunked into three parts, one file is polled and the three business objects are delivered to the endpoint (because each chunk creates a single business object).

If you use the FileChangeNotification property, then the size of the event file must be a multiple of the split chunks.

For example, for an event file that contains 90 bytes, the split size can either be 15, 6, 3, or 2.

When the event file is not a multiple of the split chunks and the last business object is smaller than the split size, the adapter delivers the last business object to the endpoint correctly during the first event poll. When new contents are appended to the event file and the FileChangeNotification property is specified as True, then the updated business object that was smaller than the split size, does not send any new content to the endpoint. The sample scenarios for this configuration, when a content is split by 2 bytes, are described in the following example.

When the content "ABCDE" is split by 2 bytes, so that the last business object contains only "E", then the adapter delivers the contents "AB", "CD", and "E" to the endpoint during the first event poll. In the next event poll, if the content is changed to:

When an event file contains failed business objects and file splitting by size is enabled, then the event file is archived with the .fail extension in the specified archive directory.

At the endpoint, the adapter does not reassemble the chunked data into a single file, but it provides information about the chunks to enable IBM BPM or WebSphere Enterprise Service Bus to reassemble them into a single file. The chunk information is included in the ChunkFileName property of the FlatFileInputStreamRecord record, and includes the chunk size in bytes and the event ID. The event ID of a chunk uses the following form: eventFileLocation_/_timestampStr_/_MofN, where M is the current chunk number and N is the total number of chunks.

C:\flatfile\eventdir\eventfile.in_/_2005_01_10_10_17_49_864_/_3of5, where timestampStr has the following format: year_month_day_hour_minutes_seconds_milliseconds.


Chunk information in WebSphere Adapter for Flat Files, version 7.5

With WebSphere Adapter for Flat Files, version 7.5, the event ID does not contain the total business object count. Therefore, by default the total business object count is no longer part of the chunk information being sent to the endpoint. The format of the event ID is changed to: EventID=AbsolutePathOfEventFileNameInLocalEventDirectory_/_YYYY_MM_DD_HH_mm_ss_SSS_/_currentBONumber, where the YYYY_MM_DD_HH_mm_ss_SSS string represents year_month_day_hour_minutes_seconds_milliseconds.

Optionally, you can include the total business object count in the chunk information by using the includeBOCountInChunkInfo property. When you enable the includeBOCountInChunkInfo property, the total business object count is included in the chunk information being sent to the endpoint.

Following is the format of the chunk information, when you enable the includeBOCountInChunkInfo property:

AbsolutePathOfEventFileNameInLocalEventDirectory_/_YYYY_MM_DD_HH_mm_ss_SSS_/_currentBONumberofTotalBOCount
For example, the chunk information can be: C:\flatfile\eventdir\c5.txt_/_2010_11_17_14_35_34_509_/_4of5

Following is the format of the chunk information, when you disable the includeBOCountInChunkInfo property:

AbsolutePathOfEventFileNameInLocalEventDirectory_/_YYYY_MM_DD_HH_mm_ss_SSS_/_currentBONumber
For example, the chunk information can be: C:\flatfile\eventdir\c5.txt_/_2010_11_17_14_35_34_509_/_4

See Include total business object count in the ChunkInfo (includeBOCountInChunkInfo).

Inbound processing


Related reference:

Activation specification properties