I have data that looks like this:
X000 X001 X002 X003 X004 X005 X006 X007 X008 X009 ... X384 X385 X386 X387 X388 X389 X390 X391 X392 Y
0 0.435294 0.568627 0.470588 0.239216 0.062745 0.003922 0.000000 0.000000 0.133333 0.321569 ... 0.694118 0.000000 0.000000 0.000000 0.000000 0.000000 0.054902 0.243137 0.537255 5
1 0.290196 0.192157 0.098039 0.039216 0.011765 0.000000 0.000000 0.000000 0.482353 0.627451 ... 0.278431 0.007843 0.019608 0.035294 0.054902 0.082353 0.133333 0.141176 0.098039 0
2 0.000000 0.000000 0.000000 0.000000 0.003922 0.027451 0.082353 0.184314 0.000000 0.000000 ... 0.380392 0.007843 0.050980 0.164706 0.301961 0.400000 0.443137 0.415686 0.360784 4
3 0.745098 0.952941 0.870588 0.329412 0.035294 0.000000 0.000000 0.000000 0.000000 0.058824 ... 0.000000 0.015686 0.211765 0.341176 0.800000 0.101961 0.000000 0.000000 0.000000 1
4 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 9
X000
to X392
are the data columns and Y
is the column I’m trying to classify for.
I want to build an ensemble of 5 models with specific constraints, such as:
- The base classifier needs to be a fully connected neural network
- For the ensemble, I need to use neural networks of different hyper-parameters
- Better to create a NN factory that receives a list of neurons per layer
- The ensemble can be simple bagging or majority vote
- The ensemble needs to be a class and not inherited from Pytorch/Keras
- 5 models
- Accuracy to measure model performance
- During training, training and validation accuracies need to be plotted
I have never architected a model like this before. May I get some help on how I should go about it?
Previously, I have experience doing this with pure Machine Learning:
import pandas as pd
from sklearn import model_selection, svm
from sklearn.ensemble import AdaBoostClassifier
from xgboost import XGBClassifier
dataframe = pd.read_csv('pima-indians-diabetes.csv')
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
num_trees = 30
kfold1 = model_selection.KFold(n_splits=10)
model1 = AdaBoostClassifier(n_estimators=num_trees, random_state=seed)
results1 = model_selection.cross_val_score(model1, X, Y, cv = kfold1)
print('AdaBoost: ', results1.mean())
kfold2 = model_selection.KFold(n_splits=10)
model2 = XGBClassifier(n_estimators=num_trees, random_state=seed)
results2 = model_selection.cross_val_score(model2, X, Y, cv=kfold2)
print('XGBoost: ', results2.mean())