Multitask, Concurrent Execution and OpenMP
In Simulink® you can configure your models to run on Multicore Target systems. Further details can be found in the MathWorks® documentation. Beckhoff targets usually offer a multi-core architecture, which can be used efficiently with TwinCAT 3. This is also possible with the TwinCAT Target for Simulink® as shown below.
A distinction is made in this description between Multitask, Concurrent Execution and OpenMP.
- With Multitask, a TcCOM object is created which has several tasks available. All tasks must run on the same core. It is not parallelized.
- With Concurrent Execution, a TcCOM object is also created with multiple tasks that can be distributed on different cores. Calculations can actually be executed in parallel.
- With OpenMP, a TcCOM object is created with a task context. In addition, multiple JobTasks distributed on different cores can execute the code fragments generated as OpenMP code in parallel.
Multitask and Concurrent Execution
The following multirate system in Simulink® is considered for the descriptions of the options Multitask and Concurrent Execution. The model has an explicit and an implicit rate transition.
Go to Configuration Parameters and select Solver. Here you can choose between:
- Treat each discrete rate as separate task
- Allow tasks to execute concurrently on target
Treat each discrete rate as separate task: Multitask
If a TcCOM object is created with the Treat each discrete rate as separate task option enabled, you will get an object to which you can assign multiple task contexts. 3 tasks in this case.
The inputs, outputs and all other DataAreas are divided into the different contexts, so there are 3 Input DataAreas and 3 Output DataAreas in this case.
In this case, the cyclic tasks must all be placed on the same core. There is no parallel processing of the tasks.
The advantage over a TcCOM with only one task interface is that now not all calculations have to be completed within the fastest task cycle time (see Scheduling). If the above Simulink model were created with default setting without Treat each discrete rate as separate task, only one task with 10 ms (fastest task) would be linkable. This means that all calculations must be completed within this time. By distributing to multiple tasks on the same core, this rule is disabled because tasks can interrupt each other (see Priorities).
Properties:
- No function block is supported in the PLC.
- The TwinCAT Usermode Runtime is not supported.
- All tasks are assigned to the same core.
- The fastest task must be assigned the highest priority (smallest priority value). The second fastest task the second highest priority and so on.
Scheduling Details:
The graphic below describes an example of how the computing times can be distributed. The hatched areas indicate that a task may not work during this time due to a higher priority task. The full blue area indicates that the task is working. Note that the surfaces have only been subsequently overlaid on the real-time monitor image to aid comprehension and are not real images.
- Task 2, Task 3, and Task 4 are executed sequentially on the same core in Tick 1. The execution of Task 2 and Task 3 runs without interruption. The execution of Task 4 is interrupted by the higher priority Task 2 in the transition to Tick 2.
- Task 2 is executed first in Tick 2. The execution of Task 4 is resumed after Task 2 is completed.
- Task 2 starts again and Task 3 follows in Tick 3.
If cycle time overruns occur and scheduling cannot be adhered to, the execution of the respective task context is skipped until all relevant contexts are in the appropriate state. In the TcCOM object this behavior can be observed via the online parameter SkippedExecutionCount
.
Allow tasks to execute concurrently on target: Concurrent Execution
If a TcCOM object is created with the Allow tasks to execute concurrently on target option enabled, you will get an object to which you can assign multiple task contexts. In this case, as in the example above, 3 tasks.
Again, the DataAreas are separated into the different contexts. The difference to the multitask object is that you can now distribute the tasks to different cores so that the processing is actually parallelized.
Properties:
- No function block is supported in the PLC.
- The TwinCAT Usermode Runtime is supported.
- Tasks can be assigned to different cores.
- The fastest task must be assigned the highest priority (smallest priority value). The second fastest task the second highest priority and so on.
Scheduling Details:
The graphic below describes an example of how the computing times can be distributed. The full blue area indicates that the task is working. Note that the surfaces have only been subsequently overlaid on the real-time monitor image to aid comprehension and are not real images.
- Task 2, Task 3 and Task 4 are executed in parallel on different cores in Tick 1. The execution of Task 1 must be completed by the start of Tick 2.
- Task 2 is executed again in tick 2. Task 3 and Task 4 may continue to work. The execution of Task 2 and Task 3 must be completed by the start of Tick 3.
- Task 2 and task 3 are executed again in tick 3. Task 4 may continue to work. The execution of Task 2 and Task 4 must be completed by the start of Tick 4.
If cycle time overruns occur and scheduling cannot be adhered to, the execution of the respective task context is skipped until all relevant contexts are in the appropriate state. In the TcCOM object this behavior can be observed via the online parameter SkippedExecutionCount
.
OpenMP
The Simulink CoderTM or the MATLAB CoderTM can generate openMP code. Please refer to the MathWorks® documentation for the exact cases in which this happens.
The following is an example using a MATLAB® Function in Simulink®. A MATLAB® example can be found in conjunction with the TE1401 TwinCAT Target for MATLAB® in the examples: TwinCAT.ModuleGenerator.Samples.Start('Code parallelization with OpenMP')
.
The parfor command is used to parallelize the FOR loop in the MATLAB® function. In this case, the number of parallel workers is limited to 4.
function y = MyFunction(u) %#codegen
A = ones(20,50);
t = 42;
parfor (i = 1:10,4)
A(i,1) = A(i,1) + t;
end
y = A(1,4) + u;
No special settings regarding openMP have to be made for the TwinCAT target. You generate your TwinCAT objects as usual. The Simulink Coder compiles this code into openMP code, so that the C/C++ code is parallelized accordingly. The Embedded Coder is not required for this feature.
In the TwinCAT XAE you can now instantiate the created TcCOM or the PLC-FB and configure it accordingly. As usual, the object instance offers only a cyclic task interface under the Context tab. A Task 2 with 200 ms cycle time is created and assigned to the object in this example.
There is a parameter JobPoolID under Parameter (Init). Here, as far as known from the C/C++ code, it is also shown how many workers can work in parallel. A JobPool is an organization unit for JobTasks, which can be created in the Tasks node.
Accordingly, an object of type TcJobPool must be added under TcCOM Objects with "Add new item". Under Parameter (Init) on the TcJobPool object, the JobPoolId is to be entered and a group of JobTasks is to be referenced. First define how many JobTasks the pool should combine and then select the JobTasks with the drop-down menu.
Under System > Realtime you can distribute JobTasks to different cores.
Execution in the configuration shown above then takes place as follows. Task 2 is executed on core 4 and cyclically drives the openmp object. The code fragments generated as openMP code can then outsource tasks to the configured JobTasks via the JobPool. When the JobTasks have finished their calculations, all partial results are bundled again and Task 2 on core 4 executes the code to the end.