ensemble_learner.decision_comb_bagging

A bagging of many base hyperbox-based models trained on the full set of features and a subset of samples. The predicted class is computed based on the voting mechanism of decisions of base models.

class hbbrain.numerical_data.ensemble_learner.decision_comb_bagging.DecisionCombinationBagging(base_estimator=None, n_estimators=10, max_samples=0.5, bootstrap=False, class_balanced=False, n_jobs=1, random_state=None)[source]

Bases: ClassifierMixin, BaseBagging

A Bagging classifier of base hyperbox-based models trained on a full set of features and a subset of samples.

A decision combination Bagging classifier of hyperbox-based models is an ensemble meta-estimator that fits base hyperbox-based classifiers each on random subsets of the original samples and then aggregate their individual predictions by voting to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a single estimator, by introducing randomization into its construction procedure and then making an ensemble out of it. This algorithm encompasses several works from the literature. When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting [1]. If samples are drawn with replacement, then the method is known as Bagging [2]. See [3] for more detailed information regarding the combination of base hyperbox-based models.

Parameters:
base_estimatorobject, default=None

The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a OnlineGFMM.

n_estimatorsint, default=10

The number of base estimators in the ensemble.

max_samplesint or float, default=0.5

The number of samples to draw from X to train each base estimator (with no replacement by default, see bootstrap for more details). - If int, then draw max_samples samples. - If float, then draw max_samples * X.shape[0] samples.

bootstrapbool, default=False

Whether samples are drawn with replacement. If False, sampling without replacement is performed.

class_balancedbool, default=False

Whether samples are drawn without replacement to build a final subset with the equal number of samples among classes.

n_jobsint, default=1

The number of jobs to run in parallel for both fit() and predict(). None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

random_stateint, RandomState instance or None, default=None

Controls the random resampling of the original dataset (sample wise and feature wise). If the base estimator accepts a random_state attribute, a different seed is generated for each instance in the ensemble. Pass an int for reproducible output across multiple function calls.

References

[1]

L. Breiman, “Pasting small votes for classification in large databases and on-line”, Machine Learning, vol. 36, no. 1, pp. 85-103, 1999.

[2]

L. Breiman, “Bagging predictors”, Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.

[3]

B. Gabrys,”Combining neuro-fuzzy classifiers for improved generalisation and reliability”, in Proceedings of the 2002 International Joint Conference on Neural Networks, vol. 3, pp. 2410-2415, 2002.

Examples

>>> from hbbrain.numerical_data.incremental_learner.iol_gfmm import ImprovedOnlineGFMM
>>> from hbbrain.numerical_data.ensemble_learner.decision_comb_bagging import DecisionCombinationBagging
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=100, n_features=4,
...                            n_informative=2, n_redundant=0,
...                            random_state=0, shuffle=False)
>>> from sklearn.preprocessing import MinMaxScaler
>>> scaler = MinMaxScaler()
>>> scaler.fit(X)
MinMaxScaler()
>>> X = scaler.transform(X)
>>> clf = DecisionCombinationBagging(base_estimator=ImprovedOnlineGFMM(0.1),
...                         n_estimators=10, random_state=0).fit(X, y)
>>> clf.predict([[1, 0.6, 0.5, 0.2]])
array([1])
Attributes:
base_estimator_estimator

The base estimator from which the ensemble is grown.

estimators_list of estimators

The collection of fitted base estimators.

estimators_samples_list of arrays

The subset of drawn samples for each base estimator.

classes_ndarray of shape (n_classes,)

The classes labels.

n_classes_int or list

The number of classes.

Methods

fit(X, y)

Build a Bagging ensemble of estimators from the training set (X, y).

get_n_hyperboxes()

Get total number of hyperboxes in all base learners.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predict class for X.

predict_proba(X)

Predict class probabilities for X.

predict_with_membership(X)

Predict class memberships for X.

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.

simple_pruning_base_estimators(X_val, y_val)

Simply prune low qualitied hyperboxes based on a pre-defined accuracy threshold for each hyperbox.

fit(X, y)[source]

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
Xarray-like of shape (n_samples, n_features)

The training input samples.

yarray-like of shape (n_samples,)

The real class labels

Returns:
selfobject

Fitted estimator.

predict(X)[source]

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability using voting.

Parameters:
Xarray-like of shape (n_samples, n_features)

The testing input samples.

Returns:
yndarray of shape (n_samples,)

The predicted classes.

predict_proba(X)[source]

Predict class probabilities for X.

The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the hyperbox-based learners in the ensemble model. The class probability of a single hyperbox-based learner is the fraction of the membership value of the representative hyperbox of that class and the sum of all membership values of all representative hyperboxes of all classes.

Parameters:
Xarray-like of shape (n_samples, n_features)

The input samples for prediction.

Returns:
all_probasndarray of shape (n_samples, n_classes)

The class probabilities of the input samples. The order of the classes corresponds to that in ascending integers of class labels.

predict_with_membership(X)[source]

Predict class memberships for X.

The predicted class memberships of an input sample are computed as the mean predicted class memberships of the hyperbox-based learners in the ensemble model. The class membership of a single hyperbox-based learner is the membership from the input X to the representative hyperbox of that class to join the prediction procedure.

Parameters:
Xarray-like of shape (n_samples, n_features)

The input samples for prediction.

Returns:
mem_valsndarray of shape (n_samples, n_classes)

The class memberships of the input samples. The order of the classes corresponds to that in ascending integers of class labels.