Parallel processing with Transfer Tray

The following section deals with thread-safe and multi-core capable data transmission, which is provided by the TwinCAT 3 Condition Monitoring Library.

Asynchronous communication and parallel execution of computationally intensive steps

Condition Monitoring applications often require data sets of several megabytes in size, which increase the demands on computing time and power. The maximum permissible computing time is based on the cycle time, which must never be exceeded for drive controllers, for example. For this reason, multi-task software architectures for TwinCAT 3 Condition Monitoring applications are recommended in the case of computationally intensive algorithms. See Chapter "Task settings".

Idea of the transfer tray

This requires thread-safe implementations of the algorithms. The TwinCAT 3 Condition Monitoring Library offers a very efficient and easy-to-use communication mechanism that eliminates typical problems with locking and unlocking data as far as possible. The library offers a very efficient mechanism for parallel processing of data, e.g. with different data rates. This allows for error-free transfer of array data between multiple tasks for exclusive synchronized access - using queues based on the transfer tray. This also allows the use of multi-core CPUs without synchronization problems and prevents hard to diagnose errors such as blockages and inconsistencies caused by not synchronized overrides of numerical data.

The library function blocks may not be declared as global instances in the list of global variables because parallel write access to MultiArray buffers (see section MultiArray Handling) and parallel execution of the same function blocks are expressly prohibited.

Example of the necessity of cycle time transitions

In some circumstances, a sequential concept is not sufficient. This is always the case when the processing of a data set takes more time than the cycle time of a control task allows.

For example, the control task has a cycle time of 1 millisecond and data oversampling of 20 samples per cycle (equivalent to a sampling rate of 20 kHz). For signal processing, a frequency resolution of 0.16 Hz is required, which may be necessary for the analysis of large roller bearings, for example, in order to distinguish between deficiencies in the inner and outer raceway, which run at only slightly different speeds.

The relationship between FFT-length N, frequency resolution Δf and sampling rate fs is: N = fs / Δf (for simplification, a rectangular window is assumed here). The result is an FFT length of
N = 125000. In addition, the FFT length N' must be a power of two; hence, with log2(125000) = 16.93, it follows that the signal of length N to N' = 217 = 131072 is filled with zeros.

The required computing time depends on the performance of the CPU, but the calculation in the control task is definitely not possible. The required amount of input data corresponds to a signal segment of several seconds, so that the calculation is therefore rarely necessary.

Solution concept with the transfer tray

The high-performance solution provided by the Condition Monitoring Library is shown in the diagram below. The control task collects data in "packets" of 20 samples via the oversampling terminal (shown in blue in the diagram). These are stored in a buffer whose size corresponds to the length of the input buffer of the amplitude spectrum function block (125000 / 20 = 6250, shown in green in the diagram). Once the buffer is full, i.e. after 3125 cycles of the control task, its object reference is transferred to a second task (processing task) with the aid of an asynchronous communication mechanism (FIFO principle), which has a much longer cycle time of 20 milliseconds. According to the rule of thumb described in Task Setting, a maximum cycle time of 1,562.5 ms is allowed for the calculating task. This requirement is clearly met with the value of 20 ms.

Parallel processing with Transfer Tray 1:


This communication mechanism uses hardware-secured, so-called atomic operations to guarantee that only one of the tasks has access to the corresponding buffer (hereinafter also referred to as MultiArray) at the same time. This is similar to a transfer tray at a bank counter, which ensures that either the customer or the cashier (but not both simultaneously) can access its contents.

Parallel processing with Transfer Tray 2:

Response latency

The FIFO principle applies to queues. Therefore, and because of asynchronous communication, the result is not immediately available. Responses with variable latency are possible.

The calculation result (the magnitude spectrum) is returned to the control task via a further queue with the same communication mechanism, which can then further evaluate it. Of course, communication to another, third task and the provision of the result in the computing task itself is also possible.

In general, compared to motion applications the computing task is not subject to hard real-time conditions and can therefore be executed with a lower priority than the control task. The task management of the TwinCAT 3 system ensures that the task with the highest priority is always executed first, so that these real-time conditions can be fulfilled even with complex calculations.

The presented concept can be used on both single-core and multi-core CPUs. Distribution over many cores is possible without the central locks causing bottlenecks.

Parallel processing with Transfer Tray 3:

Timeout

The internal communication commands for the transfer tray may fail in rare cases, e.g. depending on the properties of the hardware. If, for example, there is an empty buffer in the queue that cannot be removed, because another task is currently accessing it. A synchronous timeout is specified and may occur as a result of a timeout error. The program must therefore always be prepared for the possible error state to the effect that a buffer required for the continuity of the signal data is not available. Consequential errors such as data overflow and discontinuities of analyzed time series must be processed in a consistent manner. As long as the input signal data of an analysis chain can be collected without errors, discontinuities do not occur. If a single timeout occurred in a downstream algorithm function block, or if no result MultiArray buffer was available for the downstream algorithm function block, neither input data nor result data are lost. They are transferred during the next call.

How the transfer tray works

The transfer tray itself is displayed using an internal function block provided by the Tc3_CM library. This function block is initialized with initial parameters that are defined in the global structure instance.

The typical use of queues is that buffers from exactly one task are added to the queue with a fixed data stream identifier, and these buffers are removed from a specific other task for processing. These buffers are then sent back via another queue with a different binding identifier and reused. However, it is also no problem if several tasks have read or write access to the same queues, e.g. when analyzing statistical data.

The MultiArray buffers

So-called MultiArray buffers are used to communicate data via the transfer tray from one task to the next. These are explained in the chapter "Using the MultiArray feature".