Execution Provider
The following ExecutionProviders are currently available:
- CPU
- CUDA
The ExecutionProvider is transferred with the Configure method of the FB_MlSvrPrediction.
CPU
The loaded AI model is executed on the CPU resources of the IPC. If a TwinCAT runtime is active on the same device, only the CPU resources that are not used by TwinCAT can be used (isolated cores are only used by TwinCAT, shared cores can only be used for a limited time). The operating system takes over the parallelization of the calculation on all available threads.
If several clients create a session with Execution Provider CPU on a server, the inference requests are processed one after the other. Please note the Priority
parameter when calling Predict.
CUDA
The loaded AI model is executed on the GPU resources of the IPC. Several sessions can be run in parallel on one GPU if the GPU resources are sufficient.
The nDeviceId
field allows you to distinguish between multiple graphics cards that may be installed. For computers with a maximum of one graphics card, the value can be left at the default value, otherwise the field corresponds to the CUDA compute index of the graphics cards. If a computer with several graphics cards is used, the following system environment variable should be set:
CUDA_DEVICE_ORDER='PCI_BUS_ID'
Furthermore, the TcMlServer allows sharing of inference resources between different FBs that provide the same specification for their inference session. The use of a shared inference engine (bExclusiveSession
= FALSE
) can occur in cases where a bottleneck - for example in memory on graphics cards - would otherwise be expected. For stateful models (e.g. recurrent models), however, such a configuration should be avoided.