ensemble_learner.decision_comb_cross_val_bagging

A bagging of many base hyperbox-based models trained on the full set of features and a subset of samples. Each base learner is trained by random search-based hyper-parameter tuning and k-fold cross-validation The predicted class is computed based on the voting mechanism of decisions of base models.

class hbbrain.numerical_data.ensemble_learner.decision_comb_cross_val_bagging.DecisionCombinationCrossValBagging(base_estimator=None, base_estimator_params={}, n_estimators=10, max_samples=0.5, bootstrap=False, class_balanced=False, n_iter=10, scoring='accuracy', k_fold=5, n_jobs=1, random_state=None)[source]

Bases: ClassifierMixin, BaseCrossValBagging

A Bagging classifier of base hyperbox-based models trained on a full set of features and a subset of samples, in which each base learner is trained by random search-based hyper-parameter tuning and k-fold cross-validation.

A decision combination cross-validation Bagging classifier of hyperbox-based models is an ensemble meta-estimator that fits base hyperbox-based classifiers each on random subsets of the original samples using random search-based hyper-parameter tuning and k-fold cross-validation, and then aggregate their individual predictions by voting to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a single estimator, by introducing randomization into its construction procedure and then making an ensemble out of it. This algorithm encompasses several works from the literature. When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting [1]. If samples are drawn with replacement, then the method is known as Bagging [2]. See [3] for more detailed information regarding the combination of base hyperbox-based models.

Parameters:
base_estimatorobject, default=None

The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a OnlineGFMM.

base_estimator_paramsdict or list of dicts, default={}

Dictionary with parameters names (str) as keys and distributions or lists of parameters to try. If a list is given, it is sampled uniformly. If a list of dicts is given, first a dict is sampled uniformly, and then a parameter is sampled using that dict as above.

n_estimatorsint, default=10

The number of base estimators in the ensemble.

max_samplesint or float, default=0.5

The number of samples to draw from X to train each base estimator (with no replacement by default, see bootstrap for more details). - If int, then draw max_samples samples. - If float, then draw max_samples * X.shape[0] samples.

bootstrapbool, default=False

Whether samples are drawn with replacement. If False, sampling without replacement is performed.

class_balancedbool, default=False

Whether samples are drawn without replacement to build a final subset with the equal number of samples among classes.

n_iterint, default=10

Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.

scoringstr or callable default=’accuracy’

Strategy to evaluate the performance of the cross-validated model on the test set. If scoring represents a single score, one can use: - a single string (see The scoring parameter: defining model evaluation rules in sklearn). - a callable (see Defining your scoring strategy from metric functions) that returns a single value.

n_jobsint, default=1

The number of jobs to run in parallel for both fit() and predict(). None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

k_foldint, default=5

Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 5-fold cross validation, - integer, to specify the number of folds in a (Stratified)KFold, For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, Stratified K-Fold is used.

random_stateint, RandomState instance or None, default=None

Controls the random resampling of the original dataset (sample wise and feature wise). If the base estimator accepts a random_state attribute, a different seed is generated for each instance in the ensemble. Pass an int for reproducible output across multiple function calls.

References

[1]

L. Breiman, “Pasting small votes for classification in large databases and on-line”, Machine Learning, vol. 36, no. 1, pp. 85-103, 1999.

[2]

L. Breiman, “Bagging predictors”, Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.

[3]

B. Gabrys,”Combining neuro-fuzzy classifiers for improved generalisation and reliability”, in Proceedings of the 2002 International Joint Conference on Neural Networks, vol. 3, pp. 2410-2415, 2002.

Examples

>>> from hbbrain.numerical_data.incremental_learner.onln_gfmm import OnlineGFMM
>>> from hbbrain.numerical_data.ensemble_learner.decision_comb_cross_val_bagging import DecisionCombinationCrossValBagging
>>> from sklearn.datasets import make_classification
>>> import numpy as np
>>> X, y = make_classification(n_samples=100, n_features=4,
...                            n_informative=2, n_redundant=0,
...                            random_state=0, shuffle=False)
>>> from sklearn.preprocessing import MinMaxScaler
>>> scaler = MinMaxScaler()
>>> scaler.fit(X)
MinMaxScaler()
>>> X = scaler.transform(X)
>>> clf = DecisionCombinationCrossValBagging(base_estimator=OnlineGFMM(0.1),
...                         base_estimator_params = {'theta': np.arange(0.05, 1.01, 0.05), 'theta_min':[1], 'gamma':[0.5, 1, 2, 4, 8, 16]},
...                         n_estimators=10, random_state=0).fit(X, y)
>>> clf.predict([[1, 0.6, 0.5, 0.2]])
array([1])
Attributes:
base_estimator_estimator

The base estimator from which the ensemble is grown.

estimators_list of estimators

The collection of fitted base estimators.

estimators_samples_list of arrays

The subset of drawn samples for each base estimator.

classes_ndarray of shape (n_classes,)

The classes labels.

n_classes_int or list

The number of classes.

Methods

fit(X, y)

Build a Bagging ensemble of estimators from the training set (X, y).

get_n_hyperboxes()

Get total number of hyperboxes in all base learners.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predict class for X.

predict_proba(X)

Predict class probabilities for X.

predict_with_membership(X)

Predict class memberships for X.

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.

simple_pruning_base_estimators(X_val, y_val)

Simply prune low qualitied hyperboxes based on a pre-defined accuracy threshold for each hyperbox.

fit(X, y)[source]

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
Xarray-like of shape (n_samples, n_features)

The training input samples.

yarray-like of shape (n_samples,)

The real class labels

Returns:
selfobject

Fitted estimator.

predict(X)[source]

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability using voting.

Parameters:
Xarray-like of shape (n_samples, n_features)

The testing input samples.

Returns:
yndarray of shape (n_samples,)

The predicted classes.

predict_proba(X)[source]

Predict class probabilities for X.

The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the hyperbox-based learners in the ensemble model. The class probability of a single hyperbox-based learner is the fraction of the membership value of the representative hyperbox of that class and the sum of all membership values of all representative hyperboxes of all classes.

Parameters:
Xarray-like of shape (n_samples, n_features)

The input samples for prediction.

Returns:
all_probasndarray of shape (n_samples, n_classes)

The class probabilities of the input samples. The order of the classes corresponds to that in ascending integers of class labels.

predict_with_membership(X)[source]

Predict class memberships for X.

The predicted class memberships of an input sample are computed as the mean predicted class memberships of the hyperbox-based learners in the ensemble model. The class membership of a single hyperbox-based learner is the membership from the input X to the representative hyperbox of that class to join the prediction procedure.

Parameters:
Xarray-like of shape (n_samples, n_features)

The input samples for prediction.

Returns:
mem_valsndarray of shape (n_samples, n_classes)

The class memberships of the input samples. The order of the classes corresponds to that in ascending integers of class labels.