ensemble_learner.cross_val_random_hyperboxes

Functions and classes for the cross-validation random hyperboxes model.

class hbbrain.numerical_data.ensemble_learner.cross_val_random_hyperboxes.CrossValRandomHyperboxesClassifier(base_estimator=None, base_estimator_params={}, n_estimators=10, max_samples=0.5, max_features='sqrt', class_balanced=False, feature_balanced=False, n_iter=10, scoring='accuracy', k_fold=5, n_jobs=1, random_state=None)[source]

Bases: ClassifierMixin, BaseEnsemble

A Corss-validation Random Hyperboxes classifier of base hyperbox-based models trained on a subset of features and a subset of samples together with random search-based hyper-parameter tuning and k-fold cross-validation.

A Random Hyperboxes classifier of hyperbox-based models is an ensemble meta-estimator that fits base hyperbox-based classifiers each on random subsets of both original samples and features using k-fold cross-validation and hyper-parameter tuning based on random search. Then, base learners are aggregated with their individual predictions by voting to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a single estimator, by introducing randomization into its construction procedures and then making an ensemble out of it. Subsets of features and samples of the random hyperboxes are builts by random subsampling without replacement. See [1] for more detailed information regarding the random hyperboxes classifier.

Parameters:
base_estimatorobject, default=None

The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a OnlineGFMM.

base_estimator_paramsdict or list of dicts, default={}

Dictionary with parameters names (str) as keys and distributions or lists of parameters to try. If a list is given, it is sampled uniformly. If a list of dicts is given, first a dict is sampled uniformly, and then a parameter is sampled using that dict as above.

n_estimatorsint, default=10

The number of base estimators in the ensemble.

max_samplesint or float, default=0.5

The number of samples to draw from X to train each base estimator (with no replacement by default, see bootstrap for more details).

  • If int, then draw max_samples samples.

  • If float, then draw max_samples * X.shape[0] samples.

max_features{“sqrt”, “log2”}, int or float, default=”sqrt”

The maximum number of features to consider when building training data for base learners:

  • If int, then consider max_features features.

  • If float, then max_features is a fraction and round(max_features * n_features) features are considered.

  • If “sqrt”, then max_features=sqrt(n_features).

  • If “log2”, then max_features=log2(n_features).

class_balancedbool, default=False

Whether samples are drawn without replacement to build a final subset with the equal number of samples among classes.

feature_balanced: bool, default = False

Whether number of features of training sets for all base learners are equal to each other or not.

n_iterint, default=10

Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.

scoringstr or callable default=’accuracy’

Strategy to evaluate the performance of the cross-validated model on the test set. If scoring represents a single score, one can use: - a single string (see The scoring parameter: defining model evaluation rules in sklearn). - a callable (see Defining your scoring strategy from metric functions) that returns a single value.

k_foldint, default=5

Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 5-fold cross validation, - integer, to specify the number of folds in a (Stratified)KFold, For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, Stratified K-Fold is used.

n_jobsint, default=1

The number of jobs to run in parallel for both fit() and predict(). None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

random_stateint, RandomState instance or None, default=None

Controls the random resampling of the original dataset (sample wise and feature wise). If the base estimator accepts a random_state attribute, a different seed is generated for each instance in the ensemble. Pass an int for reproducible output across multiple function calls.

References

[1]

T. T. Khuat and B. Gabrys “Random Hyperboxes”, IEEE Transactions on Neural Networks and Learning Systems, 2021.

Examples

>>> from hbbrain.numerical_data.incremental_learner.iol_gfmm import ImprovedOnlineGFMM
>>> from hbbrain.numerical_data.ensemble_learner.cross_val_random_hyperboxes import CrossValRandomHyperboxesClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=100, n_features=4,
...                            n_informative=2, n_redundant=0,
...                            random_state=0, shuffle=False)
>>> from sklearn.preprocessing import MinMaxScaler
>>> scaler = MinMaxScaler()
>>> scaler.fit(X)
MinMaxScaler()
>>> X = scaler.transform(X)
>>> clf = CrossValRandomHyperboxesClassifier(base_estimator=ImprovedOnlineGFMM(0.1),
...                         base_estimator_params={'theta': np.arange(0.05, 1.01, 0.05), 'gamma':[0.5, 1, 2, 4, 8, 16]},
...                         n_estimators=10, random_state=0).fit(X, y)
>>> clf.predict([[1, 0.6, 0.5, 0.2]])
array([1])
Attributes:
base_estimator_estimator

The base estimator from which the ensemble is grown.

n_features_int

Number of features seen during fit.

estimators_list of estimators

The collection of fitted base estimators.

estimators_samples_list of arrays

The subset of drawn samples for each base estimator.

classes_ndarray of shape (n_classes,)

The classes labels.

n_classes_int or list

The number of classes.

Methods

fit(X, y)

Build a random hyperbox model from the training set (X, y).

get_n_hyperboxes()

Get total number of hyperboxes in all base learners.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predict class for X.

predict_proba(X)

Predict class probabilities for X.

predict_with_membership(X)

Predict class memberships for X.

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.

simple_pruning_base_estimators(X_val, y_val)

Simply prune low qualitied hyperboxes based on a pre-defined accuracy threshold for each hyperbox.

property estimators_samples_[source]

The subset of drawn samples for each base estimator. Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note

The list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

fit(X, y)[source]

Build a random hyperbox model from the training set (X, y).

Parameters:
Xarray-like of shape (n_samples, n_features)

The training input samples.

yarray-like of shape (n_samples,)

The class labels.

Returns:
selfobject

Fitted estimator.

get_n_hyperboxes()[source]

Get total number of hyperboxes in all base learners.

Returns:
int

Total number of hyperboxes in all base learners.

predict(X)[source]

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability using voting.

Parameters:
Xarray-like of shape (n_samples, n_features)

The testing input samples.

Returns:
yndarray of shape (n_samples,)

The predicted classes.

predict_proba(X)[source]

Predict class probabilities for X.

The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the hyperbox-based learners in the ensemble model. The class probability of a single hyperbox-based learner is the fraction of the membership value of the representative hyperbox of that class and the sum of all membership values of all representative hyperboxes of all classes.

Parameters:
Xarray-like of shape (n_samples, n_features)

The input samples for prediction.

Returns:
all_probasndarray of shape (n_samples, n_classes)

The class probabilities of the input samples. The order of the classes corresponds to that in ascending integers of class labels.

predict_with_membership(X)[source]

Predict class memberships for X.

The predicted class memberships of an input sample are computed as the mean predicted class memberships of the hyperbox-based learners in the ensemble model. The class membership of a single hyperbox-based learner is the membership from the input X to the representative hyperbox of that class to join the prediction procedure.

Parameters:
Xarray-like of shape (n_samples, n_features)

The input samples for prediction.

Returns:
mem_valsndarray of shape (n_samples, n_classes)

The class memberships of the input samples. The order of the classes corresponds to that in ascending integers of class labels.

simple_pruning_base_estimators(X_val, y_val, acc_threshold=0.5, keep_empty_boxes=False)[source]

Simply prune low qualitied hyperboxes based on a pre-defined accuracy threshold for each hyperbox. This operation is applied for all base estimators.

Parameters:
X_valarray-like of shape (n_samples, n_features)

The data matrix contains validation patterns.

y_valndarray of shape (n_samples,)

A vector contains the true class label corresponding to each validation pattern.

acc_thresholdfloat, optional, default=0.5

The minimum accuracy for each hyperbox to be kept unchanged.

keep_empty_boxesboolean, optional, default=False

Whether to keep the hyperboxes which do not join the prediction process on the validation set. If True, keep them, else the decision for keeping or removing based on the classification accuracy on the validation dataset

Returns:
self

A random hyperboxes model with base estimators prunned.