PyTorch Model Ensembler for Convolutional Neural Networks (CNN's)

Dear All,
Dear All, As a service to the community, I decided to provide all my PyTorch ensembling code on github.

Here, we investigate the effect of PyTorch model ensembles by combining the top-N single models crafted during the training phase. The results demonstrate that model ensembles may significantly outperform conventional single model approaches. Moreover, the method constructs an ensemble of deep CNN models with different architectures that are complementary to each other.

Ensemble learning:
Ensemble learning is a technique of using several models or for solving a particular classification problem. Ensemble methods seek to promote diversity among the models they combine and reduce the problem related to overfitting of the training data-sets. The outputs of the individual models of the ensemble are combined (e.g. by averaging) to form the final prediction

During inference, the responses of the individual ConvNets of the ensemble are averaged to form the final classification. Both of these are well studied techniques in the machine learning community and relate to model averaging and over-fitting prevention.

If you want to investigate image classification by ensembling models, this is a repository that will help you out in doing so. It shows how to perform CNN ensembling in PyTorch with publicly available data sets. It is based on many hours of debugging and a bunch of of official pytorch tutorials/examples.

I felt that it was not exactly super trivial to perform ensembling in PyTorch, and so I thought I’d release my code as a tutorial which I wrote originally for my Kaggle. Using my code coupled with smart Ensembling, you can reach the top ~50 on the current board.

Do note that there may be bugs and errors, but it should run out of the box given it is pointed to the correct location of the data set on your local machine: parser.add_argument(’–data_path’, default=‘d:/db/data/ice/’, type=str, help=‘Path to dataset’)

In addition, many parts of the code are just copies of other open source projects such as SeNet, DenseNet etc, all credit belongs to the original authors.

Feel free to clone/use/comment etc and I am always here to answer questions (on my free time …)



Very good. Do you ensemble during training or just inference? How the gradient update if using ensemble during training


I want to know the answer