{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Model-level Bagging of Hyperbox-based Models"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example shows how to use a Bagging classifier with a combination at the model level to generate a single model from many base learners, in which each base hyperbox-based model is trained on a full set of features and a subset of samples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "from sklearn.metrics import accuracy_score\n",
    "from sklearn.model_selection import train_test_split\n",
    "from hbbrain.numerical_data.ensemble_learner.model_comb_bagging import ModelCombinationBagging\n",
    "from hbbrain.numerical_data.incremental_learner.onln_gfmm import OnlineGFMM\n",
    "from hbbrain.numerical_data.batch_learner.accel_agglo_gfmm import AccelAgglomerativeLearningGFMM"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load dataset.\n",
    "This example will use the breast cancer dataset available in sklearn to demonstrate how to use this ensemble classifier. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_breast_cancer\n",
    "from sklearn.preprocessing import MinMaxScaler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = load_breast_cancer()\n",
    "X = df.data\n",
    "y = df.target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Normailise data into the range of [0, 1] as hyperbox-based models only work in the unit cube\n",
    "scaler = MinMaxScaler()\n",
    "X = scaler.fit_transform(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Split data into training, validation and testing sets\n",
    "Xtr_val, X_test, ytr_val, y_test = train_test_split(X, y, train_size=0.8, random_state=0)\n",
    "Xtr, X_val, ytr, y_val = train_test_split(X, y, train_size=0.75, random_state=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**This example will use the GFMM classifier with the original online learning algorithm as base learners. However, any type of hyperbox-based learning algorithms in this library can also be used to train base learners.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Using random subsampling to generate training sets for various base learners"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### a. Training without pruning for base learners"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialise parameters\n",
    "n_estimators = 20 # number of base learners\n",
    "max_samples = 0.5 # sampling rate for samples\n",
    "bootstrap = False # random subsampling without replacement\n",
    "class_balanced = False # do not use the class-balanced sampling mode\n",
    "n_jobs = 4 # number of processes is used to build base learners"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Init a hyperbox-based model used to train base learners\n",
    "# Using the GFMM classifier with the original online learning algorithm with the maximum hyperbox size 0.1\n",
    "base_estimator = OnlineGFMM(theta=0.1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Init a hyperbox-based model used to aggregate the resulting hyperboxes from all base learners\n",
    "# Using the accelerated agglomerative learning algorithm for the GFMM model to do this task\n",
    "model_level_estimator = AccelAgglomerativeLearningGFMM(theta=0.1, min_simil=0, simil_measure='long')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n",
       "                                                  V=array([], dtype=float64),\n",
       "                                                  W=array([], dtype=float64),\n",
       "                                                  theta=0.1),\n",
       "                        model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n",
       "                                                                             simil_measure='long',\n",
       "                                                                             theta=0.1),\n",
       "                        n_estimators=20, n_jobs=4, random_state=0)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_comb_bagging_subsampling = ModelCombinationBagging(base_estimator=base_estimator, model_level_estimator=model_level_estimator, n_estimators=n_estimators, max_samples=max_samples, bootstrap=bootstrap, class_balanced=class_balanced, n_jobs=n_jobs, random_state=0)\n",
    "model_comb_bagging_subsampling.fit(Xtr, ytr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training time: 16.647 (s)\n"
     ]
    }
   ],
   "source": [
    "print(\"Training time: %.3f (s)\"%(model_comb_bagging_subsampling.elapsed_training_time))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total number of hyperboxes in all base learners = 3948\n"
     ]
    }
   ],
   "source": [
    "print('Total number of hyperboxes in all base learners = %d'%model_comb_bagging_subsampling.get_n_hyperboxes())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of hyperboxes in the combined model = 401\n"
     ]
    }
   ],
   "source": [
    "print('Number of hyperboxes in the combined model = %d'%model_comb_bagging_subsampling.get_n_hyperboxes_comb_model())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using majority voting from predicted results of all base learners"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred_voting = model_comb_bagging_subsampling.predict_voting(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Testing accuracy using voting of decisions from base learners =  93.86%\n"
     ]
    }
   ],
   "source": [
    "acc_voting = accuracy_score(y_test, y_pred_voting)\n",
    "print(f'Testing accuracy using voting of decisions from base learners = {acc_voting * 100 : .2f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using the final combined single model to make prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Testing accuracy of the combined model =  92.98%\n"
     ]
    }
   ],
   "source": [
    "y_pred = model_comb_bagging_subsampling.predict(X_test)\n",
    "acc = accuracy_score(y_test, y_pred)\n",
    "print(f'Testing accuracy of the combined model = {acc * 100: .2f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Apply pruning for the final combined model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n",
       "                                                  V=array([], dtype=float64),\n",
       "                                                  W=array([], dtype=float64),\n",
       "                                                  theta=0.1),\n",
       "                        model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n",
       "                                                                             simil_measure='long',\n",
       "                                                                             theta=0.1),\n",
       "                        n_estimators=20, n_jobs=4, random_state=0)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "acc_threshold=0.5 # minimum accuracy score of the unpruned hyperboxes\n",
    "keep_empty_boxes=False # False means hyperboxes that do not join the prediction process within the pruning procedure are also eliminated\n",
    "model_comb_bagging_subsampling.simple_pruning(X_val, y_val, acc_threshold, keep_empty_boxes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of hyperboxes of the combined single model after pruning = 393\n"
     ]
    }
   ],
   "source": [
    "print('Number of hyperboxes of the combined single model after pruning = %d'%model_comb_bagging_subsampling.get_n_hyperboxes_comb_model())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prediction after doing a pruning procedure for the combined single model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Testing accuracy after pruning the final model =  94.74%\n"
     ]
    }
   ],
   "source": [
    "y_pred_2 = model_comb_bagging_subsampling.predict(X_test)\n",
    "acc_pruned = accuracy_score(y_test, y_pred_2)\n",
    "print(f'Testing accuracy after pruning the final model = {acc_pruned * 100: .2f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### b. Training with pruning for base learners"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n",
       "                                                  V=array([], dtype=float64),\n",
       "                                                  W=array([], dtype=float64),\n",
       "                                                  theta=0.1),\n",
       "                        model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n",
       "                                                                             simil_measure='long',\n",
       "                                                                             theta=0.1),\n",
       "                        n_estimators=20, n_jobs=4, random_state=0)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_comb_bagging_subsampling_base_learner_pruning = ModelCombinationBagging(base_estimator=base_estimator, model_level_estimator=model_level_estimator, n_estimators=n_estimators, max_samples=max_samples, bootstrap=bootstrap, class_balanced=class_balanced, n_jobs=n_jobs, random_state=0)\n",
    "model_comb_bagging_subsampling_base_learner_pruning.fit(Xtr, ytr, is_pruning_base_learners=True, X_val=X_val, y_val=y_val, acc_threshold=acc_threshold, keep_empty_boxes=keep_empty_boxes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training time: 8.254 (s)\n"
     ]
    }
   ],
   "source": [
    "print(\"Training time: %.3f (s)\"%(model_comb_bagging_subsampling_base_learner_pruning.elapsed_training_time))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total number of hyperboxes in all base learners = 2195\n"
     ]
    }
   ],
   "source": [
    "print('Total number of hyperboxes in all base learners = %d'%model_comb_bagging_subsampling_base_learner_pruning.get_n_hyperboxes())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of hyperboxes in the combined model = 388\n"
     ]
    }
   ],
   "source": [
    "print('Number of hyperboxes in the combined model = %d'%model_comb_bagging_subsampling_base_learner_pruning.get_n_hyperboxes_comb_model())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using majority voting from predicted results of all base learners"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred_voting = model_comb_bagging_subsampling_base_learner_pruning.predict_voting(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Testing accuracy using voting of decisions from base learners =  95.61%\n"
     ]
    }
   ],
   "source": [
    "acc_voting = accuracy_score(y_test, y_pred_voting)\n",
    "print(f'Testing accuracy using voting of decisions from base learners = {acc_voting * 100 : .2f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using the final combined single model to make prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Testing accuracy of the combined model =  94.74%\n"
     ]
    }
   ],
   "source": [
    "y_pred = model_comb_bagging_subsampling_base_learner_pruning.predict(X_test)\n",
    "acc = accuracy_score(y_test, y_pred)\n",
    "print(f'Testing accuracy of the combined model = {acc * 100: .2f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Apply pruning for the final combined model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n",
       "                                                  V=array([], dtype=float64),\n",
       "                                                  W=array([], dtype=float64),\n",
       "                                                  theta=0.1),\n",
       "                        model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n",
       "                                                                             simil_measure='long',\n",
       "                                                                             theta=0.1),\n",
       "                        n_estimators=20, n_jobs=4, random_state=0)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "acc_threshold=0.5 # minimum accuracy score of the unpruned hyperboxes\n",
    "keep_empty_boxes=False # False means hyperboxes that do not join the prediction process within the pruning procedure are also eliminated\n",
    "model_comb_bagging_subsampling_base_learner_pruning.simple_pruning(X_val, y_val, acc_threshold, keep_empty_boxes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of hyperboxes of the combined single model after pruning = 383\n"
     ]
    }
   ],
   "source": [
    "print('Number of hyperboxes of the combined single model after pruning = %d'%model_comb_bagging_subsampling_base_learner_pruning.get_n_hyperboxes_comb_model())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prediction after doing a pruning procedure for the combined single model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Testing accuracy after pruning the final model =  94.74%\n"
     ]
    }
   ],
   "source": [
    "y_pred_2 = model_comb_bagging_subsampling_base_learner_pruning.predict(X_test)\n",
    "acc_pruned = accuracy_score(y_test, y_pred_2)\n",
    "print(f'Testing accuracy after pruning the final model = {acc_pruned * 100: .2f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Using random undersampling to generate class-balanced training sets for various base learners"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### a. Training without pruning for base learners"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialise parameters\n",
    "n_estimators = 20 # number of base learners\n",
    "max_samples = 0.5 # sampling rate for samples\n",
    "bootstrap = False # random subsampling without replacement\n",
    "class_balanced = True # use the class-balanced sampling mode\n",
    "n_jobs = 4 # number of processes is used to build base learners"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Init a hyperbox-based model used to train base learners\n",
    "# Using the GFMM classifier with the original online learning algorithm with the maximum hyperbox size 0.1\n",
    "base_estimator = OnlineGFMM(theta=0.1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Init a hyperbox-based model used to aggregate the resulting hyperboxes from all base learners\n",
    "# Using the accelerated agglomerative learning algorithm for the GFMM model to do this task\n",
    "model_level_estimator = AccelAgglomerativeLearningGFMM(theta=0.1, min_simil=0, simil_measure='long')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n",
       "                                                  V=array([], dtype=float64),\n",
       "                                                  W=array([], dtype=float64),\n",
       "                                                  theta=0.1),\n",
       "                        class_balanced=True,\n",
       "                        model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n",
       "                                                                             simil_measure='long',\n",
       "                                                                             theta=0.1),\n",
       "                        n_estimators=20, n_jobs=4, random_state=0)"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_comb_bagging_class_balanced = ModelCombinationBagging(base_estimator=base_estimator, model_level_estimator=model_level_estimator, n_estimators=n_estimators, max_samples=max_samples, bootstrap=bootstrap, class_balanced=class_balanced, n_jobs=n_jobs, random_state=0)\n",
    "model_comb_bagging_class_balanced.fit(Xtr, ytr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training time: 16.955 (s)\n"
     ]
    }
   ],
   "source": [
    "print(\"Training time: %.3f (s)\"%(model_comb_bagging_class_balanced.elapsed_training_time))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total number of hyperboxes in all base learners = 4010\n"
     ]
    }
   ],
   "source": [
    "print('Total number of hyperboxes in all base learners = %d'%model_comb_bagging_class_balanced.get_n_hyperboxes())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of hyperboxes in the combined model = 400\n"
     ]
    }
   ],
   "source": [
    "print('Number of hyperboxes in the combined model = %d'%model_comb_bagging_class_balanced.get_n_hyperboxes_comb_model())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using majority voting from predicted results of all base learners"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred_voting = model_comb_bagging_class_balanced.predict_voting(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Testing accuracy using voting of decisions from base learners =  92.11%\n"
     ]
    }
   ],
   "source": [
    "acc_voting = accuracy_score(y_test, y_pred_voting)\n",
    "print(f'Testing accuracy using voting of decisions from base learners = {acc_voting * 100 : .2f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using the final combined single model to make prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Testing accuracy of the combined model =  92.98%\n"
     ]
    }
   ],
   "source": [
    "y_pred = model_comb_bagging_class_balanced.predict(X_test)\n",
    "acc = accuracy_score(y_test, y_pred)\n",
    "print(f'Testing accuracy of the combined model = {acc * 100: .2f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Apply pruning for the final combined model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n",
       "                                                  V=array([], dtype=float64),\n",
       "                                                  W=array([], dtype=float64),\n",
       "                                                  theta=0.1),\n",
       "                        class_balanced=True,\n",
       "                        model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n",
       "                                                                             simil_measure='long',\n",
       "                                                                             theta=0.1),\n",
       "                        n_estimators=20, n_jobs=4, random_state=0)"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "acc_threshold=0.5 # minimum accuracy score of the unpruned hyperboxes\n",
    "keep_empty_boxes=False # False means hyperboxes that do not join the prediction process within the pruning procedure are also eliminated\n",
    "model_comb_bagging_class_balanced.simple_pruning(X_val, y_val, acc_threshold, keep_empty_boxes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of hyperboxes of the combined single model after pruning = 392\n"
     ]
    }
   ],
   "source": [
    "print('Number of hyperboxes of the combined single model after pruning = %d'%model_comb_bagging_class_balanced.get_n_hyperboxes_comb_model())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prediction after doing a pruning procedure for the combined single model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Testing accuracy after pruning the final model =  94.74%\n"
     ]
    }
   ],
   "source": [
    "y_pred_2 = model_comb_bagging_class_balanced.predict(X_test)\n",
    "acc_pruned = accuracy_score(y_test, y_pred_2)\n",
    "print(f'Testing accuracy after pruning the final model = {acc_pruned * 100: .2f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### b. Training with pruning for base learners"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n",
       "                                                  V=array([], dtype=float64),\n",
       "                                                  W=array([], dtype=float64),\n",
       "                                                  theta=0.1),\n",
       "                        class_balanced=True,\n",
       "                        model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n",
       "                                                                             simil_measure='long',\n",
       "                                                                             theta=0.1),\n",
       "                        n_estimators=20, n_jobs=4, random_state=0)"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_comb_bagging_class_balanced_base_learner_pruning = ModelCombinationBagging(base_estimator=base_estimator, model_level_estimator=model_level_estimator, n_estimators=n_estimators, max_samples=max_samples, bootstrap=bootstrap, class_balanced=class_balanced, n_jobs=n_jobs, random_state=0)\n",
    "model_comb_bagging_class_balanced_base_learner_pruning.fit(Xtr, ytr, is_pruning_base_learners=True, X_val=X_val, y_val=y_val, acc_threshold=acc_threshold, keep_empty_boxes=keep_empty_boxes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training time: 7.264 (s)\n"
     ]
    }
   ],
   "source": [
    "print(\"Training time: %.3f (s)\"%(model_comb_bagging_class_balanced_base_learner_pruning.elapsed_training_time))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total number of hyperboxes in all base learners = 2738\n"
     ]
    }
   ],
   "source": [
    "print('Total number of hyperboxes in all base learners = %d'%model_comb_bagging_class_balanced_base_learner_pruning.get_n_hyperboxes())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of hyperboxes in the combined model = 395\n"
     ]
    }
   ],
   "source": [
    "print('Number of hyperboxes in the combined model = %d'%model_comb_bagging_class_balanced_base_learner_pruning.get_n_hyperboxes_comb_model())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using majority voting from predicted results of all base learners"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred_voting = model_comb_bagging_class_balanced_base_learner_pruning.predict_voting(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Testing accuracy using voting of decisions from base learners =  94.74%\n"
     ]
    }
   ],
   "source": [
    "acc_voting = accuracy_score(y_test, y_pred_voting)\n",
    "print(f'Testing accuracy using voting of decisions from base learners = {acc_voting * 100 : .2f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using the final combined single model to make prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Testing accuracy of the combined model =  94.74%\n"
     ]
    }
   ],
   "source": [
    "y_pred = model_comb_bagging_class_balanced_base_learner_pruning.predict(X_test)\n",
    "acc = accuracy_score(y_test, y_pred)\n",
    "print(f'Testing accuracy of the combined model = {acc * 100: .2f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Apply pruning for the final combined model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ModelCombinationBagging(base_estimator=OnlineGFMM(C=array([], dtype=float64),\n",
       "                                                  V=array([], dtype=float64),\n",
       "                                                  W=array([], dtype=float64),\n",
       "                                                  theta=0.1),\n",
       "                        class_balanced=True,\n",
       "                        model_level_estimator=AccelAgglomerativeLearningGFMM(min_simil=0,\n",
       "                                                                             simil_measure='long',\n",
       "                                                                             theta=0.1),\n",
       "                        n_estimators=20, n_jobs=4, random_state=0)"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "acc_threshold=0.5 # minimum accuracy score of the unpruned hyperboxes\n",
    "keep_empty_boxes=False # False means hyperboxes that do not join the prediction process within the pruning procedure are also eliminated\n",
    "model_comb_bagging_class_balanced_base_learner_pruning.simple_pruning(X_val, y_val, acc_threshold, keep_empty_boxes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of hyperboxes of the combined single model after pruning = 100\n"
     ]
    }
   ],
   "source": [
    "print('Number of hyperboxes of the combined single model after pruning = %d'%model_comb_bagging_class_balanced_base_learner_pruning.get_n_hyperboxes_comb_model())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prediction after doing a pruning procedure for the combined single model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Testing accuracy after pruning the final model =  94.74%\n"
     ]
    }
   ],
   "source": [
    "y_pred_2 = model_comb_bagging_class_balanced_base_learner_pruning.predict(X_test)\n",
    "acc_pruned = accuracy_score(y_test, y_pred_2)\n",
    "print(f'Testing accuracy after pruning the final model = {acc_pruned * 100: .2f}%')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}