Enhanced Online Learning Algorithm with K-nearest Hyperboxes Selection for FMNN
This example shows how to use the Simpson’s Fuzzy Min-Max Neural Network classifier using an enhanced online learning algorithm with k-nearest hyperboxes selection (KNEFMNN)
Note that the numerical features in training and testing datasets must be in the range of [0, 1] because the KNEFMNN classifier require features in the unit cube.
1. Execute directly from the python file
[1]:
%matplotlib notebook
[2]:
import os
import warnings
warnings.filterwarnings('ignore')
Get the path to the this jupyter notebook file
[3]:
this_notebook_dir = os.path.dirname(os.path.abspath("__file__"))
this_notebook_dir
[3]:
'C:\\hyperbox-brain\\docs\\tutorials'
Get the home folder of the Hyperbox-Brain project
[4]:
from pathlib import Path
project_dir = Path(this_notebook_dir).parent.parent
project_dir
[4]:
WindowsPath('C:/hyperbox-brain')
Create the path to the Python file containing the implementation of the Simpson’s FMNN classifier using the enhanced online learning algorithm with k-nearest hyperboxes selection
[5]:
knefmnn_file_path = os.path.join(project_dir, Path("hbbrain/numerical_data/incremental_learner/knefmnn.py"))
knefmnn_file_path
[5]:
'C:\\hyperbox-brain\\hbbrain\\numerical_data\\incremental_learner\\knefmnn.py'
Run the found file by showing the execution directions
[6]:
!python "{knefmnn_file_path}" -h
usage: knefmnn.py [-h] -training_file TRAINING_FILE -testing_file TESTING_FILE
[--theta THETA] [--gamma GAMMA] [--k_neighbors K_NEIGHBORS]
[--is_draw IS_DRAW]
The description of parameters
required arguments:
-training_file TRAINING_FILE
A required argument for the path to training data file
(including file name)
-testing_file TESTING_FILE
A required argument for the path to testing data file
(including file name)
optional arguments:
--theta THETA Maximum hyperbox size (in the range of (0, 1])
(default: 0.5)
--gamma GAMMA A sensitivity parameter describing the speed of
decreasing of the membership function in each
dimension (larger than 0) (default: 1)
--k_neighbors K_NEIGHBORS
The number of nearest hyperboxes is considered for the
hyperbox expansion process
--is_draw IS_DRAW Show the existing hyperboxes during the training
process on the screen (default: False)
Create the path to training and testing datasets stored in the dataset folder
[7]:
training_data_file = os.path.join(project_dir, Path("dataset/syn_num_train.csv"))
training_data_file
[7]:
'C:\\hyperbox-brain\\dataset\\syn_num_train.csv'
[8]:
testing_data_file = os.path.join(project_dir, Path("dataset/syn_num_test.csv"))
testing_data_file
[8]:
'C:\\hyperbox-brain\\dataset\\syn_num_test.csv'
Run a demo program
[9]:
!python "{knefmnn_file_path}" -training_file "{training_data_file}" -testing_file "{testing_data_file}" --theta 0.1 --k_neighbors 5 --gamma 1
Number of hyperboxes = 51
Testing accuracy = 86.60%
2. Using the KNEFMNN classifier through its init, fit, and predict functions
[10]:
from hbbrain.numerical_data.incremental_learner.knefmnn import KNEFMNNClassifier
import pandas as pd
Create training and testing data sets
[11]:
df_train = pd.read_csv(training_data_file, header=None)
df_test = pd.read_csv(testing_data_file, header=None)
Xy_train = df_train.to_numpy()
Xy_test = df_test.to_numpy()
Xtr = Xy_train[:, :-1]
ytr = Xy_train[:, -1]
Xtest = Xy_test[:, :-1]
ytest = Xy_test[:, -1]
Initializing parameters
[12]:
theta = 0.1
k_neighbors = 5
gamma = 1
is_draw = True
Training
[13]:
knefmnn_clf = KNEFMNNClassifier(theta=theta, k_neighbors=k_neighbors, gamma=gamma, is_draw=is_draw)
knefmnn_clf.fit(Xtr, ytr)
[13]:
KNEFMNNClassifier(C=array([1, 2, 1, 1, 1, 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1,
1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 1, 2,
1, 2, 2, 2, 2, 2, 1]),
V=array([[0.42413 , 0.53516 ],
[0.70577 , 0.397105 ],
[0.82785 , 0.78025 ],
[0.66038 , 0.51128 ],
[0.48794 , 0.672 ],
[0.26651 , 0.18424 ],
[0.32289 , 0.59994 ],
[0.19944 , 0.03 ],
[0.28077 , 0.26124 ],
[0.63683 , 0.6936 ],
[0.28822 , 0.55512 ],
[0.03 , 0.47757 ],
[0...
[0.91185 , 0.5761 ],
[0.2246 , 0.13567 ],
[0.25929 , 0.81558 ],
[0.815 , 0.397095 ],
[0.67906 , 0.83605 ],
[0.52197 , 0.91371 ],
[0.66037 , 0.57837 ],
[0.49408 , 0.66846 ],
[0.80583 , 0.43242 ],
[0.79935 , 0.7757 ],
[0.35813 , 0.58772 ],
[0.79516 , 0.32629 ],
[0.36057 , 0.71561 ],
[0.68469 , 0.29485 ],
[0.70743 , 0.50325 ],
[0.25621 , 0.62174 ],
[0.14737 , 0.28498 ],
[0.56487 , 0.17003 ],
[0.55763 , 0.43813 ]]),
is_draw=True, theta=0.1)
The code below shows how to display decision boundaries among classes if input data are 2-dimensional
[14]:
knefmnn_clf.draw_hyperbox_and_boundary("The trained KNEFMNN classifier and its decision boundaries")
[15]:
print("Number of existing hyperboxes = %d"%(knefmnn_clf.get_n_hyperboxes()))
Number of existing hyperboxes = 51
Prediction
[16]:
from sklearn.metrics import accuracy_score
[17]:
y_pred = knefmnn_clf.predict(Xtest)
acc = accuracy_score(ytest, y_pred)
print(f'Accuracy = {acc * 100: .2f}%')
Accuracy = 86.60%
Explaining the predicted result for the input sample by showing membership values and hyperboxes for each class
[18]:
sample_need_explain = 10
y_pred_input_0, mem_val_classes, min_points_classes, max_points_classes = knefmnn_clf.get_sample_explanation(Xtest[sample_need_explain])
[19]:
print("Predicted class for sample X = [%f, %f] is %d and real class is %d" % (Xtest[sample_need_explain, 0], Xtest[sample_need_explain, 1], y_pred_input_0, ytest[sample_need_explain]))
Predicted class for sample X = [0.571640, 0.233700] is 2 and real class is 2
[20]:
print("Membership values:")
for key, val in mem_val_classes.items():
print("Class %d has the maximum membership value = %f" % (key, val))
for key in min_points_classes:
print("Class %d has the representative hyperbox: V = %s and W = %s" % (key, min_points_classes[key], max_points_classes[key]))
Membership values:
Class 1 has the maximum membership value = 0.964263
Class 2 has the maximum membership value = 0.990050
Class 1 has the representative hyperbox: V = [0.58339 0.3649 ] and W = [0.66091 0.38616125]
Class 2 has the representative hyperbox: V = [0.57285 0.27229] and W = [0.66773 0.36489]
Show input sample and hyperboxes belonging to each class. In 2D, we can show rectangles or use parallel coordinates
Using rectangles to show explanations
[21]:
knefmnn_clf.show_sample_explanation(Xtest[sample_need_explain], Xtest[sample_need_explain], min_points_classes, max_points_classes, y_pred_input_0, "2D")
Using parallel coordinates. This mode best fits for any dimensions
[22]:
# Create a parallel coordinates graph
knefmnn_clf.show_sample_explanation(Xtest[sample_need_explain], Xtest[sample_need_explain], min_points_classes, max_points_classes, y_pred_input_0, file_path="par_cord/knefmnn_par_cord.html")
[23]:
# Load parallel coordinates to display on the notebook
from IPython.display import IFrame
# We load the parallel coordinates from GitHub here for demostration in readthedocs
# On the local notebook, we only need to load from the graph storing at 'par_cord/knefmnn_par_cord.html'
IFrame('https://uts-caslab.github.io/hyperbox-brain/docs/tutorials/par_cord/knefmnn_par_cord.html', width=820, height=520)
[23]: