mixed_data.freq_cat_onln_gfmm

General fuzzy min-max neural network trained by the batch incremental learning algorithm, in which categorical features are encoded using the ordinal encoding method and the similarity among categorical values are computed using their frequency of occurence with respect to all class labels in a training set.

class hbbrain.mixed_data.freq_cat_onln_gfmm.FreqCatOnlineGFMM(theta=0.5, theta_min=1, eta=0.5, gamma=1, alpha=0.9, V=None, W=None, E=None, F=None, C=None)[source]

Bases: BaseHyperboxClassifier

Batch Incremental learning algorithm with mixed-attribute data for a general fuzzy min-max neural network, in which categorical features are encoded using the ordinal encoding method and the similarity degrees among categorical values are computed using their frequency of occurence with respect to all class labels in a training set.

This algorithm uses a distance measure between any two values of a categorical variable based on the occurrence probability of such categorical values with respect to the values of the class variable. This distance is then normalised and used to compute the membership values for categorical features in conjunction with membership values of continuous features to generate the final membership values for mixed-attribute data.

See [1] for more detailed information regarding this batch incremental learning algorithm.

Parameters:

thetafloat, optional, default=0.5: Maximum hyperbox size for continuous features.
theta_minfloat, optional, default=1: Minimum value of the maximum hyperbox size for continuous features so that the training loop is still performed. If the value of theta_min is larger than the value of theta, it will be automatically assigned a value equal to theta.
gammafloat or ndarray of shape (n_continuous_features,), optional, default=1: A sensitivity parameter describing the speed of decreasing of the membership function in each continuous feature.
etafloat, optional, default=0.5: Maximum hyperbox size for the categorical features.
alphafloat, optional, default=0.9: Multiplier factor to reduce the value of maximum hyperbox size after each training loop.
Varray-like of shape (n_hyperboxes, n_continuous_features): A matrix stores all minimal points for continuous features of all existing hyperboxes, in which each row is a minimal point of a hyperbox.
Warray-like of shape (n_hyperboxes, n_continuous_features): A matrix stores all maximal points for continuous features of all existing hyperboxes, in which each row is a minimal point of a hyperbox.
Earray-like of shape (n_hyperboxes, n_cat_features): A matrix stores all lower bounds for categorical features of all existing hyperboxes, in which each row is a lower bound of a hyperbox.
Farray-like of shape (n_hyperboxes, n_cat_features): A matrix stores all upper bounds for categorical features of all existing hyperboxes, in which each row is an upper bound of a hyperbox.
Carray-like of shape (n_hyperboxes,): A vector stores all class labels correponding to existing hyperboxes.

References

[1]

T. T. Khuat and B. Gabrys “An in-depth comparison of methods handling mixed-attribute data for general fuzzy min–max neural network”, Neurocomputing, vol 464, pp. 175-202, 2021.

Examples

>>> from hbbrain.mixed_data.freq_cat_onln_gfmm import FreqCatOnlineGFMM
>>> from hbbrain.datasets import load_japanese_credit
>>> X, y = load_japanese_credit()
>>> from sklearn.preprocessing import MinMaxScaler
>>> scaler = MinMaxScaler()
>>> numerical_features = [1, 2, 7, 10, 13, 14]
>>> categorical_features = [0, 3, 4, 5, 6, 8, 9, 11, 12]
>>> scaler.fit(X[:, numerical_features])
MinMaxScaler()
>>> X[:, numerical_features] = scaler.transform(X[:, numerical_features])
>>> clf = FreqCatOnlineGFMM(theta=0.1, eta=0.6)
>>> clf.fit(X, y, categorical_features)
>>> print("Number of hyperboxes = %d"%clf.get_n_hyperboxes())
Number of hyperboxes = 416
>>> clf.predict(X[[10, 100]])
array([1, 0])

Attributes:

similarity_of_cat_valsarray-like of shape (n_cat_features,): An array stores all similarity values among all pairs of categorical values for each categorical feature index. Each element in this array is an dictionary with keys being a hashed value of two categorical values and values of this dictionary being a similarity value.
categorical_features_int array of shape (n_cat_features,): Indices of categorical features in the training data and hyperboxes.
continuous_features_int array of shape (n_continuous_features,): Indices of continuous features in the training data and hyperboxes.
encoder_sklearn.preprocessing.OrdinalEncoder: An ordinal encoder was used to encode categorical features.
is_exist_continuous_missing_valueboolean: Is there any missing values in continuous features in the training data.
elapsed_training_timefloat: Training time in seconds.
n_passesint: Number of training loops.

Methods

`delay`([delay_constant])	Delay a time period to display hyperboxes
`draw_hyperbox_and_boundary`([window_name, ...])	Draw the existing hyperboxes and their decision boundaries among classes
`fit`(X, y[, categorical_features])	Build a general fuzzy min-max neural network from the training set (X, y) using the original incremental learning algorithm, in which categorical features are encoded using the ordinal encoding method and the similarity among categorical values are computed using their frequency of occurence with respect to all class labels in a training set.
`get_n_hyperboxes`()	Get number of hyperboxes in the trained hyperbox-based model
`get_params`([deep])	Get parameters for this estimator.
`get_sample_explanation`(x)	Get useful information for explaining the reason behind the predicted result for the input pattern represented by upper and lower bounds for continous features together with the lower and upper bounds for the categorical features.
`initialise_canvas_graph`([n_dims, ...])	Initialise a canvas to draw hyperboxes
`is_satisfied_cat_expansion_conds`(Ej, Fj, x_cat)	Check whether the expansion condition for categorical features x_cat of an input pattern can be covered by categorical bounds of the hyperbox Bj with the categorical features stored in the lower bound Ej and the upper bound Fj.
`predict`(X)	Predict class labels for samples in X.
`predict_proba`(X)	Predict class probabilities of the input samples X including both continuous and categorical features.
`predict_with_membership`(X)	Predict class membership values of the input samples X including both categorical and continuous features.
`score`(X, y[, sample_weight])	Return the mean accuracy on the given test data and labels.
`set_params`(**params)	Set the parameters of this estimator.
`show_sample_explanation`(xl, xu, ...[, ...])	Show explanation for predicted results of an input pattern under the form of parallel coordinates or hyperboxes in 2D or 3D planes.
`simple_pruning`(X_val, y_val[, ...])	Simply prune low qualitied hyperboxes based on a pre-defined accuracy threshold for each hyperbox.

fit(X, y, categorical_features=None)[source]

Build a general fuzzy min-max neural network from the training set (X, y) using the original incremental learning algorithm, in which categorical features are encoded using the ordinal encoding method and the similarity among categorical values are computed using their frequency of occurence with respect to all class labels in a training set.

Parameters:

Xarray-like of shape (n_samples, n_features) or (2*n_samples, n_features): The training input samples including both continuous and categorical features. If the number of rows in X is 2*n_samples, the first n_samples rows contain lower bounds of input patterns and the rest n_samples rows contain upper bounds.
yarray-like of shape (n_samples,): The class labels.
categorical_featuresa list of int, optional, default=None: Indices of categorical features in the training set. If None, there is no categorical feature.

Returns:

selfobject: Fitted estimator.

get_n_hyperboxes()[source]

Get number of hyperboxes in the trained hyperbox-based model

Returns:

int: Number of hyperboxes in the trained hyperbox-based classifier.

get_sample_explanation(x)[source]

Get useful information for explaining the reason behind the predicted result for the input pattern represented by upper and lower bounds for continous features together with the lower and upper bounds for the categorical features.

Parameters:

xndarray of shape (n_feature,): The input pattern which needs to be explained includes both continuous features and categorical features.

Returns:

y_predint: The predicted class of the input pattern
dict_mem_val_classesdictionary: A dictionary stores all membership values for all classes. The key is class label and the value is the corresponding membership value.
dict_min_point_classesdictionary: A dictionary stores all mimimal points of hyperboxes having the maximum membership value for each class. The key is the class label and the value is the minimal points of the hyperbox corresponding to that class.
dict_max_point_classesdictionary: A dictionary stores all maximal points of hyperboxes having the maximum membership value for each class. The key is the class label and the value is the maximal points of the hyperbox corresponding to that class.
dict_min_point_cat_classes: dictionary: A dictionary stores all lower bounds of categorical features for the hyperboxes having the maximum membership value for each class. The key is the class label and the value is the lower bound of categorical features for the hyperboxes corresponding to each class.
dict_max_point_cat_classes: dictionary: A dictionary stores all upper bounds of categorical features for the hyperboxes having the maximum membership value for each class. The key is the class label and the value is the upper bound of categorical features for the hyperboxes corresponding to each class.

is_satisfied_cat_expansion_conds(Ej, Fj, x_cat)[source]

Check whether the expansion condition for categorical features x_cat of an input pattern can be covered by categorical bounds of the hyperbox Bj with the categorical features stored in the lower bound Ej and the upper bound Fj.

Parameters:

Ejarray-like of shape (n_cat_features,): Lower bound of categorical features in the hyperbox Bj which can be extended to cover the input pattern.
Fjarray-like of shape (n_cat_features,): Upper bound of categorical features in the hyperbox Bj which can be extended to cover the input pattern.
x_catarray-like of shape (n_cat_features,): Categorical features of an input pattern.

Returns:

bool: If True, the categorical features in Dj are satisfied with the expansion conditions for the categorical feature so that it can be expanded to cover the input pattern. Otherwise, the conditions for the categorical features are not met.

predict(X)[source]

Predict class labels for samples in X.

Note

In the case there are many winner hyperboxes representing different class labels but with the same membership value with respect to the input pattern \(X_i\), an additional criterion based on the minimum Manhattan distance between continous featurers of \(X_i\) and the central points of continous features of winner hyperboxes are used to find the final winner hyperbox that its class label is used for predicting the class label of the input pattern \(X_i\). If there are only categorical features but many winner hyperboxes belonging to different classes, a random selection will be used to choose the final class label.

Parameters:

Xarray-like of shape (n_samples, n_features): The data matrix for which we want to predict the targets.

Returns:

y_predndarray of shape (n_samples,): Vector containing the predictions. In binary and multiclass problems, this is a vector containing n_samples.

predict_proba(X)[source]

Predict class probabilities of the input samples X including both continuous and categorical features.

The predicted class probability is the fraction of the membership value of the representative hyperbox of that class and the sum of all membership values of all representative hyperboxes of all classes.

Parameters:

Xarray-like of shape (n_samples, n_features): The input samples.

Returns:

probandarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in ascending integers of class labels.

predict_with_membership(X)[source]

Predict class membership values of the input samples X including both categorical and continuous features.

The predicted class membership value is the membership value of the representative hyperbox of that class.

Parameters:

Xarray-like of shape (n_samples, n_features): The input samples.

Returns:

mem_valsndarray of shape (n_samples, n_classes): The class membership values of the input samples. The order of the classes corresponds to that in ascending integers of class labels.

simple_pruning(X_val, y_val, acc_threshold=0.5, keep_empty_boxes=False)[source]

Simply prune low qualitied hyperboxes based on a pre-defined accuracy threshold for each hyperbox.

Parameters:

X_valarray-like of shape (n_samples, n_features): The data matrix contains both continous and categorical features of validation patterns.
y_valndarray of shape (n_samples,): A vector contains the true class label corresponding to each validation pattern.
acc_thresholdfloat, optional, default=0.5: The minimum accuracy for each hyperbox to be kept unchanged.
keep_empty_boxesboolean, optional, default=False: Whether to keep the hyperboxes which do not join the prediction process on the validation set. If True, keep them, otherwise the decision for keeping or removing based on the classification accuracy on the validation dataset.

Returns:

self: A hyperbox-based model with the low-qualitied hyperboxes pruned.

hbbrain.mixed_data.freq_cat_onln_gfmm.compute_similarity_among_categorical_values(X_cat, y)[source]

Compute the similarity among pairs of categorical values for each categorical feature.

Parameters:

X_catarray-like of shape (n_samples, n_cat_features): Input patterns contain only categorical features.
yarray-like of shape (n_samples, ): The class label corresponds to each input pattern.

Returns:

similarity_of_cat_valsarray-like of shape (n_cat_features,): An array stores all similarity values among all pairs of categorical values for each categorical feature index. Each element in this array is an dictionary with keys being a hashed value of two categorical values and values of this dictionary being a similarity value.

hbbrain.mixed_data.freq_cat_onln_gfmm.ordinal_encode_categorical_features(X, categorical_features, encoder=None)[source]

Encode categorical features as an integer array.

Parameters:

Xarray-like of shape (n_samples, n_features): An input data matrix includes both continuous and categorical features.
categorical_featuresa list of integer: Indices of categorical features in X.
encodersklearn.preprocessing.OrdinalEncoder, optional, default=None: An existing ordinal encoder is used to encode categorical features.

Returns:

Xarray-like of shape (n_samples, n_features): An input data matrix with the encoded categorical features.
encodersklearn.preprocessing.OrdinalEncoder: An ordinal encoder was used to encode categorical features.

hbbrain.mixed_data.freq_cat_onln_gfmm.predict_freq_cat_feature_manhanttan(V, W, E, F, C, Xl, Xu, X_cat, similarity_of_cat_vals, g=1)[source]

Predict class labels for samples in the form of hyperboxes with continuous features represented by low bounds Xl and upper bounds Xu and categorical features stored in X_cat. The predicted results will be computed from existing hyperboxes with continuous features matrices for lower bounds V and upper bounds W and categorical features matrices for lower bounds E and upper bounds F.

Note

In the case there are many winner hyperboxes representing different class labels but with the same membership value with respect to the input pattern \(X_i\) in the form of an hyperbox represented by a lower bound \(Xl_i\) and an upper bound \(Xu_i\) for continous features and a matrix \(Xcat_i\) for categorical features, an additional criterion based on the minimum Manhattan distance between the central point of continous features in the input hyperbox \(X_i = [Xl_i, Xu_i]\) and the central points of continous features in winner hyperboxes are used to find the final winner hyperbox that its class label is used for predicting the class label of the input hyperbox \(X_i\).

Warning

Another important point to pay attention is that the categorical features storing in \(X_cat\) need to be encoded by using the function ordinal_encode_categorical_features() before pushing the values to this method.

Parameters:

Varray-like of shape (n_hyperboxes, n_continuous_features): A matrix stores all minimal points for all continuous features of all hyperboxes of a trained hyperbox-based model, in which each row is a minimal point of a hyperbox.
Warray-like of shape (n_hyperboxes, n_continuous_features): A matrix stores all maximal points for all continuous features of all hyperboxes of a trained hyperbox-based model, in which each row is a maximal point of a hyperbox.
Earray-like of shape (n_hyperboxes, n_cat_features): A matrix stores all lower bounds for all categorical features of all hyperboxes of a trained hyperbox-based model, in which each row is a lower bound for categorical features of a hyperbox.
Farray-like of shape (n_hyperboxes, n_cat_features): A matrix stores all upper bounds for all categorical features of all hyperboxes of a trained hyperbox-based model, in which each row is a upper bound for categorical features of a hyperbox.
Carray-like of shape (n_hyperboxes,): An array contains all class lables for all hyperboxes of a trained hyperbox-based model.
Xlarray-like of shape (n_samples, n_continuous_features): The data matrix contains lower bounds for continuous features of input patterns for which we want to predict the targets.
Xuarray-like of shape (n_samples, n_continuous_features): The data matrix contains upper bounds for continuous features of input patterns for which we want to predict the targets.
X_catarray-like of shape (n_samples, n_cat_features): The data matrix contains categorical bounds for categorical features of input patterns for which we want to predict the targets.
similarity_of_cat_valsarray-like of shape (n_cat_features,): An array stores all similarity values among all pairs of categorical values for each categorical feature index. Each element in this array is an dictionary with keys being a hashed value of two categorical values and values of this dictionary being a similarity value.
gfloat or array-like of shape (n_features,), optional, default=1: A sensitivity parameter describing the speed of decreasing of the membership function in each continuous dimension.

Returns:

y_predarray-like of shape (n_samples,): Predicted class labels for all input patterns.