mixed_data.onehot_onln_gfmm

General fuzzy min-max neural network trained by the batch incremental learning algorithm for mixed attribute data, in which categorical features are encoded using one-hot encoding.

class hbbrain.mixed_data.onehot_onln_gfmm.OneHotOnlineGFMM(theta=0.5, theta_min=1, min_percent_overlap_cat=0.5, gamma=1, alpha=0.9, V=None, W=None, D=None, C=None)[source]

Bases: BaseHyperboxClassifier

Batch incremental learning algorithm with mixed-attribute data for a general fuzzy min-max neural network, in which categorical features are encoded using the one-hot encoding method and the similarity degrees among categorical values are computed using one-hot encoding values with logical operators. The final membership value is the average of membership values for continuous features and membership values for categorical features.

See [1] for more detailed information regarding this batch incremental learning algorithm.

Parameters:

thetafloat, optional, default=0.5: Maximum hyperbox size for continuous features.
theta_minfloat, optional, default=1: Minimum value of the maximum hyperbox size for continuous features so that the training loop is still performed. If the value of theta_min is larger than the value of theta, it will be automatically assigned a value equal to theta.
gammafloat or ndarray of shape (n_continuous_features,), optional, default=1: A sensitivity parameter describing the speed of decreasing of the membership function in each continuous feature.
min_percent_overlap_catfloat, optional, default=0.5: The minimum number of categorical values in the categorical features of the input pattern that match the values in the categorical dimensions of the winner hyperbox to be expansion.
alphafloat, optional, default=0.9: Multiplier factor to reduce the value of maximum hyperbox size after each training loop.
Varray-like of shape (n_hyperboxes, n_continuous_features): A matrix stores all minimal points for continuous features of all existing hyperboxes, in which each row is a minimal point of a hyperbox.
Warray-like of shape (n_hyperboxes, n_continuous_features): A matrix stores all maximal points for continuous features of all existing hyperboxes, in which each row is a minimal point of a hyperbox.
Darray-like of shape (n_hyperboxes, n_cat_features): A matrix stores all bounds for categorical features of all existing hyperboxes, in which each row is a lower bound of a hyperbox. Elements in this matrix are binary strings.
Carray-like of shape (n_hyperboxes,): A vector stores all class labels correponding to existing hyperboxes.

References

[1]

T. T. Khuat and B. Gabrys “An in-depth comparison of methods handling mixed-attribute data for general fuzzy min–max neural network”, Neurocomputing, vol 464, pp. 175-202, 2021.

Examples

>>> from hbbrain.mixed_data.onehot_onln_gfmm import OneHotOnlineGFMM
>>> from hbbrain.datasets import load_japanese_credit
>>> X, y = load_japanese_credit()
>>> from sklearn.preprocessing import MinMaxScaler
>>> scaler = MinMaxScaler()
>>> numerical_features = [1, 2, 7, 10, 13, 14]
>>> categorical_features = [0, 3, 4, 5, 6, 8, 9, 11, 12]
>>> scaler.fit(X[:, numerical_features])
MinMaxScaler()
>>> X[:, numerical_features] = scaler.transform(X[:, numerical_features])
>>> clf = OneHotOnlineGFMM(theta=0.1, min_percent_overlap_cat=0.6)
>>> clf.fit(X, y, categorical_features)
>>> print("Number of hyperboxes = %d"%clf.get_n_hyperboxes())
Number of hyperboxes = 236
>>> clf.predict(X[[10, 100]])
array([0, 0])

Attributes:

categorical_features_int array of shape (n_cat_features,): Indices of categorical features in the training data and hyperboxes.
continuous_features_int array of shape (n_continuous_features,): Indices of continuous features in the training data and hyperboxes.
encoder_sklearn.preprocessing.OneHotEncoder: An one-hot encoder was used to encode categorical features.
is_exist_continuous_missing_valueboolean: Is there any missing values in continuous features in the training data.
elapsed_training_timefloat: Training time in seconds.
n_passesint: Number of training loops.

Methods

`delay`([delay_constant])	Delay a time period to display hyperboxes
`draw_hyperbox_and_boundary`([window_name, ...])	Draw the existing hyperboxes and their decision boundaries among classes
`fit`(X, y[, categorical_features])	Build a general fuzzy min-max neural network from the training set (X, y) using the original incremental learning algorithm for mixed attribute data, in which categorical features are encoded using one-hot encoding.
`get_n_hyperboxes`()	Get number of hyperboxes in the trained hyperbox-based model
`get_params`([deep])	Get parameters for this estimator.
`get_sample_explanation`(x)	Get useful information for explaining the reason behind the predicted result for the input pattern represented by upper and lower bounds for continous features together with the bound for categorical feature.
`initialise_canvas_graph`([n_dims, ...])	Initialise a canvas to draw hyperboxes
`is_satisfied_cat_expansion_conds`(xd, Dj, ...)	Check whether the expansion condition for categorical features xd of an input pattern can be covered by categorical features of the hyperbox \(B_j\) with the categorical features stored in Dj.
`predict`(X)	Predict class labels for samples in X.
`predict_proba`(X)	Predict class probabilities of the input samples X including both continuous and categorical features.
`predict_with_membership`(X)	Predict class membership values of the input samples X including both categorical and continuous features.
`score`(X, y[, sample_weight])	Return the mean accuracy on the given test data and labels.
`set_params`(**params)	Set the parameters of this estimator.
`show_sample_explanation`(xl, xu, ...[, ...])	Show explanation for predicted results of an input pattern under the form of parallel coordinates or hyperboxes in 2D or 3D planes.
`simple_pruning`(X_val, y_val[, ...])	Simply prune low qualitied hyperboxes based on a pre-defined accuracy threshold for each hyperbox.

fit(X, y, categorical_features=None)[source]

Build a general fuzzy min-max neural network from the training set (X, y) using the original incremental learning algorithm for mixed attribute data, in which categorical features are encoded using one-hot encoding.

Parameters:

Xarray-like of shape (n_samples, n_features) or (2*n_samples, n_features): The training input samples including both continuous and categorical features. If the number of rows in X is 2*n_samples, the first n_samples rows contain lower bounds of input patterns and the rest n_samples rows contain upper bounds.
yarray-like of shape (n_samples,): The class labels.
categorical_featuresa list of int, optional, default=None: Indices of categorical features in the training set. If None, there is no categorical feature.

Returns:

selfobject.: Fitted estimator.

get_n_hyperboxes()[source]

Get number of hyperboxes in the trained hyperbox-based model

Returns:

int: Number of hyperboxes in the trained hyperbox-based classifier.

get_sample_explanation(x)[source]

Get useful information for explaining the reason behind the predicted result for the input pattern represented by upper and lower bounds for continous features together with the bound for categorical feature.

Parameters:

xndarray of shape (n_feature,): The input pattern which needs to be explained includes both continuous features and categorical features.

Returns:

y_predint: The predicted class of the input pattern
dict_mem_val_classesdictionary: A dictionary stores all membership values for all classes. The key is class label and the value is the corresponding membership value.
dict_min_point_classesdictionary: A dictionary stores all mimimal points of hyperboxes having the maximum membership value for each class. The key is the class label and the value is the minimal points of the hyperbox corresponding to that class
dict_max_point_classesdictionary: A dictionary stores all maximal points of hyperboxes having the maximum membership value for each class. The key is the class label and the value is the maximal points of the hyperbox corresponding to that class.
dict_cat_point_classes: dictionary: A dictionary stores all categorical features of hyperboxes having the maximum membership value for each class. The key is the class label and the value is the bound of categeorical features of the hyperbox corresponding to that class.

is_satisfied_cat_expansion_conds(xd, Dj, n_cat_features)[source]

Check whether the expansion condition for categorical features xd of an input pattern can be covered by categorical features of the hyperbox \(B_j\) with the categorical features stored in Dj.

Parameters:

xdarray-like of shape (n_cat_features,): Categorical features of an input pattern.
Djarray-like of shape (n_cat_features,): Categorical features bounds of the hyperbox Bj which can be extended to cover the input pattern.
n_cat_featuresint: Number of categorical features in the training set.

Returns:

bool: If True, the categorical features in Dj are satisfied with the expansion conditions for the categorical feature so that it can be expanded to cover the input pattern. Otherwise, the conditions for the categorical features are not met.

predict(X)[source]

Predict class labels for samples in X.

Note

In the case there are many winner hyperboxes representing different class labels but with the same membership value with respect to the input pattern \(X_i\), an additional criterion based on the minimum Manhattan distance between continous featurers of \(X_i\) and the central points of continous features of winner hyperboxes are used to find the final winner hyperbox that its class label is used for predicting the class label of the input pattern \(X_i\). If there are only categorical features but many winner hyperboxes belonging to different classes, a random selection will be used to choose the final class label.

Parameters:

Xarray-like of shape (n_samples, n_features): The data matrix for which we want to predict the targets.

Returns:

y_predndarray of shape (n_samples,): Vector containing the predictions. In binary and multiclass problems, this is a vector containing n_samples.

predict_proba(X)[source]

Predict class probabilities of the input samples X including both continuous and categorical features.

The predicted class probability is the fraction of the membership value of the representative hyperbox of that class and the sum of all membership values of all representative hyperboxes of all classes.

Parameters:

Xarray-like of shape (n_samples, n_features): The input samples.

Returns:

probandarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in ascending integers of class labels.

predict_with_membership(X)[source]

Predict class membership values of the input samples X including both categorical and continuous features.

The predicted class membership value is the membership value of the representative hyperbox of that class.

Parameters:

Xarray-like of shape (n_samples, n_features): The input samples.

Returns:

mem_valsndarray of shape (n_samples, n_classes): The class membership values of the input samples. The order of the classes corresponds to that in ascending integers of class labels.

simple_pruning(X_val, y_val, acc_threshold=0.5, keep_empty_boxes=False)[source]

Simply prune low qualitied hyperboxes based on a pre-defined accuracy threshold for each hyperbox.

Parameters:

X_valarray-like of shape (n_samples, n_features): The data matrix contains both continous and categorical features of validation patterns.
y_valndarray of shape (n_samples,): A vector contains the true class label corresponding to each validation pattern.
acc_thresholdfloat, optional, default=0.5: The minimum accuracy for each hyperbox to be kept unchanged.
keep_empty_boxesboolean, optional, default=False: Whether to keep the hyperboxes which do not join the prediction process on the validation set. If True, keep them, otherwise the decision for keeping or removing based on the classification accuracy on the validation dataset.

Returns:

self: A hyperbox-based model with the low-qualitied hyperboxes pruned.

hbbrain.mixed_data.onehot_onln_gfmm.impute_missing_value_cat_feature(Xd)[source]

Impute missing values of categorical features in Xd by a constant value.

Parameters:

Xdarray-like of shape (n_samples, n_cat_features): Categorical features.

Returns:

Xdarray-like of shape (n_samples, n_cat_features): Categorial features after doing data imputation.

hbbrain.mixed_data.onehot_onln_gfmm.one_hot_encoding_cat_feature(X, categorical_features, encodings=None)[source]

Encode categorical features by the one-hot encoding method.

Parameters:

Xarray-like of shape (n_samples, n_features): Input patterns.
categorical_featuresarray-like of shape (n_cat_features, ): Indices of categorical features.
encodingsa list of objects, optional, default=None: Storing a list of one-hot encoders each for a categorical feature.

Returns:

X_outarray-like of shape (n_samples, n_features): An input data matrix with the encoded categorical features.
encodings_outTYPE: An one-hot encoder was used to encode categorical features.

hbbrain.mixed_data.onehot_onln_gfmm.predict_onehot_cat_feature_manhanttan(V, W, D, C, Xl, Xu, Xd, g=1)[source]

Predict class labels for mixed-class samples in X represented in the form of invervals [Xl, Xu, Xd]. This is a common function to determine the right class labels for X wrt a trained hyperbox-based classifier represented by [V, W, D, C]. It uses the winner-takes-all principle to predict class labels for each sample in X by assigning the class label of the sample to the class label of the hyperbox with the maximum membership value to that sample. It will use a Manhattan distance for continous features in the case of many hyperboxes with different classes having the same maximum membership value. If there is no continuous feature the random selection will be used for the case of many winner hyperboxes.

Parameters:

Xlarray-like of shape (n_samples, n_continuous_features): Lower bounds of continuous features of all input samples. If None, there are no continous features.
Xuarray-like of shape (n_samples, n_continuous_features): Lower bounds of continuous features of all input samples. If None, there are no continous features.
Xdarray-like of shape (n_samples, n_cat_features): Bounds of categorical features of all input patterns. If None, there are no categorical features.
Varray-like of shape (n_hyperboxes, n_continuous_features): Minimum points of all continuous features of the existing hyperboxes in the trained model. If None, there are no continous features.
Warray-like of shape (n_hyperboxes, n_continuous_features): Maximum points of all continuous features of the existing hyperboxes in the trained model. If None, there are no continous features.
Darray-like of shape (n_hyperboxes, n_cat_features): Bounds of all categorical features of the existing hyperboxes in the trained model. If None, there are no categorical features.
Carray-like of shape (n_hyperboxes,): Class labels of all existing hyperboxes corresponding to the values stored in V, W, and D.
gfloat or ndarray of shape (n_continuous_features,), optional, default=1: A sensitivity parameter describing the speed of decreasing of the membership function in each continous dimension.

Returns:

y_predndarray of shape (n_samples,): A vector contains the predictions. In binary and multiclass problems, this is a vector containing n_samples.