{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Model-level Bagging of Hyperbox-based Models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This example shows how to use a Bagging classifier with a combination at the model level to generate a single model from many base learners, in which each base hyperbox-based model is trained on a full set of features and a subset of samples." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings('ignore')\n", "from sklearn.metrics import accuracy_score\n", "from sklearn.model_selection import train_test_split\n", "from hbbrain.numerical_data.ensemble_learner.model_comb_bagging import ModelCombinationBagging\n", "from hbbrain.numerical_data.incremental_learner.onln_gfmm import OnlineGFMM\n", "from hbbrain.numerical_data.batch_learner.accel_agglo_gfmm import AccelAgglomerativeLearningGFMM" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load dataset.\n", "This example will use the breast cancer dataset available in sklearn to demonstrate how to use this ensemble classifier. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_breast_cancer\n", "from sklearn.preprocessing import MinMaxScaler" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "df = load_breast_cancer()\n", "X = df.data\n", "y = df.target" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Normailise data into the range of [0, 1] as hyperbox-based models only work in the unit cube\n", "scaler = MinMaxScaler()\n", "X = scaler.fit_transform(X)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Split data into training, validation and testing sets\n", "Xtr_val, X_test, ytr_val, y_test = train_test_split(X, y, train_size=0.8, random_state=0)\n", "Xtr, X_val, ytr, y_val = train_test_split(X, y, train_size=0.75, random_state=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**This example will use the GFMM classifier with the original online learning algorithm as base learners. However, any type of hyperbox-based learning algorithms in this library can also be used to train base learners.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Using random subsampling to generate training sets for various base learners" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### a. Training without pruning for base learners" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Initialise parameters\n", "n_estimators = 20 # number of base learners\n", "max_samples = 0.5 # sampling rate for samples\n", "bootstrap = False # random subsampling without replacement\n", "class_balanced = False # do not use the class-balanced sampling mode\n", "n_jobs = 4 # number of processes is used to build base learners" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Init a hyperbox-based model used to train base learners\n", "# Using the GFMM classifier with the original online learning algorithm with the maximum hyperbox size 0.1\n", "base_estimator = OnlineGFMM(theta=0.1)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Init a hyperbox-based model used to aggregate the resulting hyperboxes from all base learners\n", "# Using the accelerated agglomerative learning algorithm for the GFMM model to do this task\n", "model_level_estimator = AccelAgglomerativeLearningGFMM(theta=0.1, min_simil=0, simil_measure='long')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n", " V=array([], dtype=float64),\n", " W=array([], dtype=float64),\n", " theta=0.1),\n", " model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n", " simil_measure='long',\n", " theta=0.1),\n", " n_estimators=20, n_jobs=4, random_state=0)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_comb_bagging_subsampling = ModelCombinationBagging(base_estimator=base_estimator, model_level_estimator=model_level_estimator, n_estimators=n_estimators, max_samples=max_samples, bootstrap=bootstrap, class_balanced=class_balanced, n_jobs=n_jobs, random_state=0)\n", "model_comb_bagging_subsampling.fit(Xtr, ytr)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training time: 16.647 (s)\n" ] } ], "source": [ "print(\"Training time: %.3f (s)\"%(model_comb_bagging_subsampling.elapsed_training_time))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of hyperboxes in all base learners = 3948\n" ] } ], "source": [ "print('Total number of hyperboxes in all base learners = %d'%model_comb_bagging_subsampling.get_n_hyperboxes())" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of hyperboxes in the combined model = 401\n" ] } ], "source": [ "print('Number of hyperboxes in the combined model = %d'%model_comb_bagging_subsampling.get_n_hyperboxes_comb_model())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prediction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using majority voting from predicted results of all base learners" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "y_pred_voting = model_comb_bagging_subsampling.predict_voting(X_test)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing accuracy using voting of decisions from base learners = 93.86%\n" ] } ], "source": [ "acc_voting = accuracy_score(y_test, y_pred_voting)\n", "print(f'Testing accuracy using voting of decisions from base learners = {acc_voting * 100 : .2f}%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using the final combined single model to make prediction" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing accuracy of the combined model = 92.98%\n" ] } ], "source": [ "y_pred = model_comb_bagging_subsampling.predict(X_test)\n", "acc = accuracy_score(y_test, y_pred)\n", "print(f'Testing accuracy of the combined model = {acc * 100: .2f}%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Apply pruning for the final combined model" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n", " V=array([], dtype=float64),\n", " W=array([], dtype=float64),\n", " theta=0.1),\n", " model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n", " simil_measure='long',\n", " theta=0.1),\n", " n_estimators=20, n_jobs=4, random_state=0)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "acc_threshold=0.5 # minimum accuracy score of the unpruned hyperboxes\n", "keep_empty_boxes=False # False means hyperboxes that do not join the prediction process within the pruning procedure are also eliminated\n", "model_comb_bagging_subsampling.simple_pruning(X_val, y_val, acc_threshold, keep_empty_boxes)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of hyperboxes of the combined single model after pruning = 393\n" ] } ], "source": [ "print('Number of hyperboxes of the combined single model after pruning = %d'%model_comb_bagging_subsampling.get_n_hyperboxes_comb_model())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prediction after doing a pruning procedure for the combined single model" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing accuracy after pruning the final model = 94.74%\n" ] } ], "source": [ "y_pred_2 = model_comb_bagging_subsampling.predict(X_test)\n", "acc_pruned = accuracy_score(y_test, y_pred_2)\n", "print(f'Testing accuracy after pruning the final model = {acc_pruned * 100: .2f}%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### b. Training with pruning for base learners" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n", " V=array([], dtype=float64),\n", " W=array([], dtype=float64),\n", " theta=0.1),\n", " model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n", " simil_measure='long',\n", " theta=0.1),\n", " n_estimators=20, n_jobs=4, random_state=0)" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_comb_bagging_subsampling_base_learner_pruning = ModelCombinationBagging(base_estimator=base_estimator, model_level_estimator=model_level_estimator, n_estimators=n_estimators, max_samples=max_samples, bootstrap=bootstrap, class_balanced=class_balanced, n_jobs=n_jobs, random_state=0)\n", "model_comb_bagging_subsampling_base_learner_pruning.fit(Xtr, ytr, is_pruning_base_learners=True, X_val=X_val, y_val=y_val, acc_threshold=acc_threshold, keep_empty_boxes=keep_empty_boxes)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training time: 8.254 (s)\n" ] } ], "source": [ "print(\"Training time: %.3f (s)\"%(model_comb_bagging_subsampling_base_learner_pruning.elapsed_training_time))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of hyperboxes in all base learners = 2195\n" ] } ], "source": [ "print('Total number of hyperboxes in all base learners = %d'%model_comb_bagging_subsampling_base_learner_pruning.get_n_hyperboxes())" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of hyperboxes in the combined model = 388\n" ] } ], "source": [ "print('Number of hyperboxes in the combined model = %d'%model_comb_bagging_subsampling_base_learner_pruning.get_n_hyperboxes_comb_model())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prediction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using majority voting from predicted results of all base learners" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "y_pred_voting = model_comb_bagging_subsampling_base_learner_pruning.predict_voting(X_test)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing accuracy using voting of decisions from base learners = 95.61%\n" ] } ], "source": [ "acc_voting = accuracy_score(y_test, y_pred_voting)\n", "print(f'Testing accuracy using voting of decisions from base learners = {acc_voting * 100 : .2f}%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using the final combined single model to make prediction" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing accuracy of the combined model = 94.74%\n" ] } ], "source": [ "y_pred = model_comb_bagging_subsampling_base_learner_pruning.predict(X_test)\n", "acc = accuracy_score(y_test, y_pred)\n", "print(f'Testing accuracy of the combined model = {acc * 100: .2f}%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Apply pruning for the final combined model" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n", " V=array([], dtype=float64),\n", " W=array([], dtype=float64),\n", " theta=0.1),\n", " model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n", " simil_measure='long',\n", " theta=0.1),\n", " n_estimators=20, n_jobs=4, random_state=0)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "acc_threshold=0.5 # minimum accuracy score of the unpruned hyperboxes\n", "keep_empty_boxes=False # False means hyperboxes that do not join the prediction process within the pruning procedure are also eliminated\n", "model_comb_bagging_subsampling_base_learner_pruning.simple_pruning(X_val, y_val, acc_threshold, keep_empty_boxes)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of hyperboxes of the combined single model after pruning = 383\n" ] } ], "source": [ "print('Number of hyperboxes of the combined single model after pruning = %d'%model_comb_bagging_subsampling_base_learner_pruning.get_n_hyperboxes_comb_model())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prediction after doing a pruning procedure for the combined single model" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing accuracy after pruning the final model = 94.74%\n" ] } ], "source": [ "y_pred_2 = model_comb_bagging_subsampling_base_learner_pruning.predict(X_test)\n", "acc_pruned = accuracy_score(y_test, y_pred_2)\n", "print(f'Testing accuracy after pruning the final model = {acc_pruned * 100: .2f}%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Using random undersampling to generate class-balanced training sets for various base learners" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### a. Training without pruning for base learners" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "# Initialise parameters\n", "n_estimators = 20 # number of base learners\n", "max_samples = 0.5 # sampling rate for samples\n", "bootstrap = False # random subsampling without replacement\n", "class_balanced = True # use the class-balanced sampling mode\n", "n_jobs = 4 # number of processes is used to build base learners" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "# Init a hyperbox-based model used to train base learners\n", "# Using the GFMM classifier with the original online learning algorithm with the maximum hyperbox size 0.1\n", "base_estimator = OnlineGFMM(theta=0.1)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "# Init a hyperbox-based model used to aggregate the resulting hyperboxes from all base learners\n", "# Using the accelerated agglomerative learning algorithm for the GFMM model to do this task\n", "model_level_estimator = AccelAgglomerativeLearningGFMM(theta=0.1, min_simil=0, simil_measure='long')" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n", " V=array([], dtype=float64),\n", " W=array([], dtype=float64),\n", " theta=0.1),\n", " class_balanced=True,\n", " model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n", " simil_measure='long',\n", " theta=0.1),\n", " n_estimators=20, n_jobs=4, random_state=0)" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_comb_bagging_class_balanced = ModelCombinationBagging(base_estimator=base_estimator, model_level_estimator=model_level_estimator, n_estimators=n_estimators, max_samples=max_samples, bootstrap=bootstrap, class_balanced=class_balanced, n_jobs=n_jobs, random_state=0)\n", "model_comb_bagging_class_balanced.fit(Xtr, ytr)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training time: 16.955 (s)\n" ] } ], "source": [ "print(\"Training time: %.3f (s)\"%(model_comb_bagging_class_balanced.elapsed_training_time))" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of hyperboxes in all base learners = 4010\n" ] } ], "source": [ "print('Total number of hyperboxes in all base learners = %d'%model_comb_bagging_class_balanced.get_n_hyperboxes())" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of hyperboxes in the combined model = 400\n" ] } ], "source": [ "print('Number of hyperboxes in the combined model = %d'%model_comb_bagging_class_balanced.get_n_hyperboxes_comb_model())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prediction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using majority voting from predicted results of all base learners" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "y_pred_voting = model_comb_bagging_class_balanced.predict_voting(X_test)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing accuracy using voting of decisions from base learners = 92.11%\n" ] } ], "source": [ "acc_voting = accuracy_score(y_test, y_pred_voting)\n", "print(f'Testing accuracy using voting of decisions from base learners = {acc_voting * 100 : .2f}%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using the final combined single model to make prediction" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing accuracy of the combined model = 92.98%\n" ] } ], "source": [ "y_pred = model_comb_bagging_class_balanced.predict(X_test)\n", "acc = accuracy_score(y_test, y_pred)\n", "print(f'Testing accuracy of the combined model = {acc * 100: .2f}%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Apply pruning for the final combined model" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n", " V=array([], dtype=float64),\n", " W=array([], dtype=float64),\n", " theta=0.1),\n", " class_balanced=True,\n", " model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n", " simil_measure='long',\n", " theta=0.1),\n", " n_estimators=20, n_jobs=4, random_state=0)" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "acc_threshold=0.5 # minimum accuracy score of the unpruned hyperboxes\n", "keep_empty_boxes=False # False means hyperboxes that do not join the prediction process within the pruning procedure are also eliminated\n", "model_comb_bagging_class_balanced.simple_pruning(X_val, y_val, acc_threshold, keep_empty_boxes)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of hyperboxes of the combined single model after pruning = 392\n" ] } ], "source": [ "print('Number of hyperboxes of the combined single model after pruning = %d'%model_comb_bagging_class_balanced.get_n_hyperboxes_comb_model())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prediction after doing a pruning procedure for the combined single model" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing accuracy after pruning the final model = 94.74%\n" ] } ], "source": [ "y_pred_2 = model_comb_bagging_class_balanced.predict(X_test)\n", "acc_pruned = accuracy_score(y_test, y_pred_2)\n", "print(f'Testing accuracy after pruning the final model = {acc_pruned * 100: .2f}%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### b. Training with pruning for base learners" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n", " V=array([], dtype=float64),\n", " W=array([], dtype=float64),\n", " theta=0.1),\n", " class_balanced=True,\n", " model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n", " simil_measure='long',\n", " theta=0.1),\n", " n_estimators=20, n_jobs=4, random_state=0)" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_comb_bagging_class_balanced_base_learner_pruning = ModelCombinationBagging(base_estimator=base_estimator, model_level_estimator=model_level_estimator, n_estimators=n_estimators, max_samples=max_samples, bootstrap=bootstrap, class_balanced=class_balanced, n_jobs=n_jobs, random_state=0)\n", "model_comb_bagging_class_balanced_base_learner_pruning.fit(Xtr, ytr, is_pruning_base_learners=True, X_val=X_val, y_val=y_val, acc_threshold=acc_threshold, keep_empty_boxes=keep_empty_boxes)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training time: 7.264 (s)\n" ] } ], "source": [ "print(\"Training time: %.3f (s)\"%(model_comb_bagging_class_balanced_base_learner_pruning.elapsed_training_time))" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of hyperboxes in all base learners = 2738\n" ] } ], "source": [ "print('Total number of hyperboxes in all base learners = %d'%model_comb_bagging_class_balanced_base_learner_pruning.get_n_hyperboxes())" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of hyperboxes in the combined model = 395\n" ] } ], "source": [ "print('Number of hyperboxes in the combined model = %d'%model_comb_bagging_class_balanced_base_learner_pruning.get_n_hyperboxes_comb_model())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prediction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using majority voting from predicted results of all base learners" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "y_pred_voting = model_comb_bagging_class_balanced_base_learner_pruning.predict_voting(X_test)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing accuracy using voting of decisions from base learners = 94.74%\n" ] } ], "source": [ "acc_voting = accuracy_score(y_test, y_pred_voting)\n", "print(f'Testing accuracy using voting of decisions from base learners = {acc_voting * 100 : .2f}%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using the final combined single model to make prediction" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing accuracy of the combined model = 94.74%\n" ] } ], "source": [ "y_pred = model_comb_bagging_class_balanced_base_learner_pruning.predict(X_test)\n", "acc = accuracy_score(y_test, y_pred)\n", "print(f'Testing accuracy of the combined model = {acc * 100: .2f}%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Apply pruning for the final combined model" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n", " V=array([], dtype=float64),\n", " W=array([], dtype=float64),\n", " theta=0.1),\n", " class_balanced=True,\n", " model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n", " simil_measure='long',\n", " theta=0.1),\n", " n_estimators=20, n_jobs=4, random_state=0)" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "acc_threshold=0.5 # minimum accuracy score of the unpruned hyperboxes\n", "keep_empty_boxes=False # False means hyperboxes that do not join the prediction process within the pruning procedure are also eliminated\n", "model_comb_bagging_class_balanced_base_learner_pruning.simple_pruning(X_val, y_val, acc_threshold, keep_empty_boxes)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of hyperboxes of the combined single model after pruning = 100\n" ] } ], "source": [ "print('Number of hyperboxes of the combined single model after pruning = %d'%model_comb_bagging_class_balanced_base_learner_pruning.get_n_hyperboxes_comb_model())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prediction after doing a pruning procedure for the combined single model" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing accuracy after pruning the final model = 94.74%\n" ] } ], "source": [ "y_pred_2 = model_comb_bagging_class_balanced_base_learner_pruning.predict(X_test)\n", "acc_pruned = accuracy_score(y_test, y_pred_2)\n", "print(f'Testing accuracy after pruning the final model = {acc_pruned * 100: .2f}%')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 4 }