FB_ALY_SequentialKMeans

The Sequential k-Means algorithm is an implementation of the unmonitored clustering algorithm of the same name. It is a sequential variant of the widely used k-Means clustering algorithm for streaming data. The aim of the algorithm is to find clusters based on the structure of the data, each of which contains similar data points and separates different data points from each other.

The number of input channels (referred to below as n) for this algorithm can be freely selected by the user. These inputs span the n-dimensional feature space in which the clusters are found. In each analysis cycle, the data stream provides the algorithm with a new feature vector that can be interpreted as a data point in this feature space. Data points that are close to each other in this feature space are assigned to the same cluster. The number of clusters present must be set by the user before the analysis begins and remains fixed.

In contrast to the k-Means algorithm for conventional batch analysis, the data for the Sequential k-Means are not fully available at the time of analysis. Instead, the data points arrive one by one in the form of streaming data. They are therefore processed sequentially and assigned to the corresponding cluster closest to them. This approach results in a number of differences, two of which are particularly relevant to the use of the algorithm as well as the parameter settings.

On the one hand, all data points and thus the value ranges of the individual features are already available at the beginning of a batch analysis, whereas this is not the case with sequential analysis, so that the value ranges are not necessarily fixed in advance. However, it is helpful to know the value ranges of the input channels in advance, even if the actual values only arrive during the course of the analysis. This is particularly important for the initialization of cluster centers. Three different approaches are available for initialization. The center points can be specified in the form of specific values via a parameter array. Alternatively, the center points can be set randomly or equidistantly in a defined range of values. For the initialization modes Random and Equidistant the value ranges are required and have to be set via the parameters Lower Bounds and Upper Bounds for the individual input channels.

On the other hand, in a batch analysis all data points are typically traversed multiple times to update the cluster centers until they change only minimally. This is not possible within the framework of the sequential analysis. However, in order to still be able to adjust the cluster centers and traverse data points multiple times, the algorithm Sequential k-Means has a buffering mechanism referred to as Aggregation Buffer, which makes it possible to store a limited number of values temporarily. When filling the buffer, all incoming data points are assigned to the closest cluster. The distance between a data point and the cluster centers is determined by the Euclidean norm. Only when the buffer is filled are the cluster centers updated based on the newly allocated data points in the buffer. The new cluster center corresponds to the mean value of all data points contained in the cluster. This can be calculated incrementally, so that the old data points are not needed for the calculation. The size of the buffer is set by the parameter Aggregation Buffer Size; the default value is 10. The parameter Max Iterations can be used to specify the number of iterations through the buffer. The default value is one. If the value is set to two, for example, after the first adjustment of the cluster centers the data points in the buffer are reassigned to the clusters and then the cluster centers are adjusted again. Due to the shift in cluster centers, it is possible for individual data points to be assigned to different clusters from one iteration to the next. Due to the limited computing capacity for data processing between two cycles, excessively high values should be avoided for the parameters Aggregation Buffer Size and Max Iterations, otherwise the update of the cluster centers may not be guaranteed. If the cluster centers are not updated for large values for these parameters but are updated for smaller parameter values, this is an indication that the computing capacity is insufficient for the set parameter values and smaller values should be selected.

Syntax

Definition:

FUNCTION_BLOCK FB_ALY_SequentialKMeans
VAR_OUTPUT
    ipResultMessage: Tc3_EventLogger.I_TcMessage;
    bError: BOOL;
    bNewResult: BOOL;
    bConfigured: BOOL;
    nClusterIdx: DINT;
    fDistance: LREAL;
END_VAR

Outputs

Name	Type	Description
ipResultMessage	I_TcMessage	Contains more detailed information on the current return value. This special interface pointer is internally secured so that it is always valid/assigned.
bError	BOOL	This output is `TRUE` if an error occurs.
bNewResult	BOOL	When a new result has been calculated, the output is `TRUE.`
bConfigured	BOOL	Displays `TRUE` when the function block is successfully configured.
nClusterIdx	DINT	Specifies the cluster index that the DBSCAN algorithm outputs for the data point of the current cycle.
fDistance	LREAL	Specifies the total number of clusters detected by the DBSCAN algorithm.

Sample

VAR
    fbSequentialKMeans : FB_ALY_SequentialKMeans(nNumChannels := 2, nNumClusters := 3, nAggBufferSize := 10);
    
    nMaxIterations : UDINT :=1;
    eInitMode : E_ALY_KMeansInitMode := E_ALY_KMeansInitMode.Values;
    aInitialClusterCenters : ARRAY[1..3] OF ARRAY[1..2] OF LREAL := [[-30, 0], [10, 2], [30, 4]];    
    bConfigure : BOOL := TRUE;

    nInputCh1 : UDINT;
    fInputCh2 : LREAL; 
    bUpdateClusterCenters : BOOL := TRUE;

    aClusterCenters : ARRAY[1..3] OF ARRAY[1..2] OF LREAL;
END_VAR

// Configure algorithm
IF bConfigure THEN
    bConfigure := FALSE;

    fbSequentialKMeans.Configure(nMaxIterations := nMaxIterations, eInitMode := eInitMode);
    fbSequentialKMeans.SetInitialClusterCenters(ADR(aInitialClusterCenters), SIZEOF(aInitialClusterCenters));
END_IF

// Call algorithm
fbSequentialKMeans.SetChannelValue(1, nInputCh1);
fbSequentialKMeans.SetChannelValue(2, fInputCh2);
fbSequentialKMeans.Call(bUpdateClusterCenters, ADR(aClusterCenters), SIZEOF(aClusterCenters));

Requirements

Development environment	Target platform	Plc libraries to include
TwinCAT v3.1.4024.0	PC or CX (x64, x86)	Tc3_Analytics

Methods

Name	Definition location	Description
Call()	Local	Method for calculating the outputs for a specific configuration.
Configure()	Local	General configuration of the algorithm with its parameterized conditions.
FB_init()	Local	Initializes the number of input channels.
GetResults()	Local	Getting the result matrix without adding new values
Reset()	Local	Resets all internal states or the calculations performed so far.
SetChannelValue()	Local	Method for passing values to the algorithm.
SetInitialBounds()	Local	Sets the initial boundaries. Depends on the configured initialization mode.
SetInitialClusterCenters()	Local	Sets the initial cluster centers. Depends on the configured initialization mode.

Further Information