3D CNN models ensemble

Hi all,

I’m trying to solve a problem of video recognition using 3d cnn’s.
I want to classify the videos into 6 classes, I tried training an END-TO-END 3d cnn’s model that didn’t give me good results (around 40% accuracy) so I decided to try a different approach and training 6 models of binary classification for each class separately.
Each individual model out of the 6 models I trained gave me good accuracy and low loss.
Now I want to ensemble all the 6 models together but the problem is that each of the 6 models has it’s own mean and std normalization.
My question is: I want to be able to load a batch of videos and to normalize them according to the model they are about to be processed with - I have 6 models so I need to normalize the videos 6 times.
What is the best way for doing that? Do I need to create a different dataset for each of the models and normalize them accordingly or is there a much more simple and efficient way?

Best regards,
Yana

What do you mean by “each model has its own mean and std normalization”?
Usually, you normalize the dataset and the statistics do not depend on the model, i.e. the mean and std is calculated from the training data.
Could you explain a bit more about your use case?

sure @ptrblck,
I’m trying to classify 6 classes for video classification problem. I tried running end-2-end model (i3d) but it gave me bad results (42% accuracy) so I decided to solve 6 different classification problem (binary instead of multi).
1 model for class A tells me 1 if the it’s class A or 0 if it’s not class A, same for class B i trained a model that predicts 1 if the video belongs to class B or 0 if it’s not etc for all of my 6 classes.
Each of these 6 models has their own mean and std (calculated on the training set) but i chose to do it a little differently… I decided for each model to calculate only the mean and std for the specific class - for example for model 1 that predicts class A i divided my training set into positive and negative samples (positive - 1 - all the videos belonging to class A and 0 - all the videos that are not classified as A) - so that’s how i calculated my mean and std for each of my classes.

I wrote a wrapper Module that takes all these 6 models and inference each one individually and concatenating the results into a Linear layer with 6 output channels (for the 6 classification i want to predict) when running my model in training it takes a lot of time to do the inference and a lot of memory for some reason.

i’m attaching my code for that:

import os,sys,inspect

currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(currentdir)
sys.path.insert(0,parentdir)

import torch
from torch import nn
from Utils.Transforms import ToTensor
from torch.autograd import Variable


class NetEnsemble(nn.Module):
	"""docstring for NetEnsemble"""
	def __init__(self, batch_size, use_half, ensemble_models=None, normalizations=[]):
		super(ClarityNetEnsemble, self).__init__()
		to_tensor = ToTensor()

		self.ensemble_models = nn.ModuleList(ensemble_models)
		self.log_softmax = nn.LogSoftmax(1)
		self.fc = nn.Linear(12, 6)

		self.register_buffer('net1_mean', to_tensor(normalizations[0][0]).unsqueeze(0))
		self.register_buffer('net1_std', to_tensor(normalizations[0][1]).unsqueeze(0))
		self.register_buffer('net2_mean', to_tensor(normalizations[1][0]).unsqueeze(0))
		self.register_buffer('net2_std', to_tensor(normalizations[1][1]).unsqueeze(0))
		self.register_buffer('net3_mean', to_tensor(normalizations[2][0]).unsqueeze(0))
		self.register_buffer('net3_std', to_tensor(normalizations[2][1]).unsqueeze(0))
		self.register_buffer('net4_mean', to_tensor(normalizations[3][0]).unsqueeze(0))
		self.register_buffer('net4_std', to_tensor(normalizations[3][1]).unsqueeze(0))
		self.register_buffer('net5_mean', to_tensor(normalizations[4][0]).unsqueeze(0))
		self.register_buffer('net5_std', to_tensor(normalizations[4][1]).unsqueeze(0))
		self.register_buffer('net6_mean', to_tensor(normalizations[5][0]).unsqueeze(0))
		self.register_buffer('net6_std', to_tensor(normalizations[5][1]).unsqueeze(0))

	def forward(self, x):
		log_softmaxs = []
		softmaxs = []

		#normalize x according to the model mean and std
		x_normalized = (x - Variable(self.net1_mean, volatile=True))/Variable(self.net1_std, volatile=True)
		#do inference on the normalized input
		log_softmax, softmax = self.ensemble_models[0](x_normalized)
		log_softmaxs.append(log_softmax)
		softmaxs.append(softmax)

		#normalize x according to the model mean and std
		x_normalized = (x - Variable(self.net2_mean, volatile=True))/Variable(self.net2_std, volatile=True)
		#do inference on the normalized input
		log_softmax, softmax = self.ensemble_models[1](x_normalized)
		log_softmaxs.append(log_softmax)
		softmaxs.append(softmax)

		#normalize x according to the model mean and std
		x_normalized = (x - Variable(self.net3_mean, volatile=True))/Variable(self.net3_std, volatile=True)
		#do inference on the normalized input
		log_softmax, softmax = self.ensemble_models[2](x_normalized)
		log_softmaxs.append(log_softmax)
		softmaxs.append(softmax)

		#normalize x according to the model mean and std
		x_normalized = (x - Variable(self.net4_mean, volatile=True))/Variable(self.net4_std, volatile=True)
		#do inference on the normalized input
		log_softmax, softmax = self.ensemble_models[3](x_normalized)
		log_softmaxs.append(log_softmax)
		softmaxs.append(softmax)

		#normalize x according to the model mean and std
		x_normalized = (x - Variable(self.net5_mean, volatile=True))/Variable(self.net5_std, volatile=True)
		#do inference on the normalized input
		log_softmax, softmax = self.ensemble_models[4](x_normalized)
		log_softmaxs.append(log_softmax)
		softmaxs.append(softmax)

		#normalize x according to the model mean and std
		x_normalized = (x - Variable(self.net6_mean, volatile=True))/Variable(self.net6_std, volatile=True)
		#do inference on the normalized input
		log_softmax, softmax = self.ensemble_models[5](x_normalized)
		log_softmaxs.append(log_softmax)
		softmaxs.append(softmax)

		out = torch.cat(log_softmaxs, 1)
		out = out.view(out.size(0), -1) #flatten outputs
		out = self.fc(out)
		
		out.volatile = False
		out.requires_grad = True
		
		return self.log_softmax(out)

Ok, interesting idea.
So as far as I understand your approach, each models uses its mean and std, which were calculated on the positive samples for the appropriate class. Am I right?

Did this approach outperform 6 different models using a global mean and std?

However, you could relocate the standardization into the Dataset returning 6 differently normalized samples.
Through this, you could push some computation into a DataLoader, i.e. CPU, while your model ensemble calculates the predictions.

What is the overall accuracy of the model ensemble compared to the first model (~40% accuracy)?

@ptrblck, yes you are right!
Yes, the overall accuracy was around ~40% when i used a global mean and std of the entire training set, this works better since it highlights the difference between the videos in my positive and negative samples (Gives me ~75% accuracy for each model so the average is quite good i think).

I thought about relocating the standardization into the Dataset but was afraid it will blow up my RAM (the machine i’m using for experiments has only 64GB at the moment with 4 GPUs of 1080 ti).

I tried to perform inference of my 6 models one after the other and then concatenate the results into another module that just have Linear layer that has 6 channels but my i ran out of memory on the GPU and the RAM almost blew up.

I’m just afraid that my 6 models are too big (i’m using I3D) to use as inference together (maybe i’m wrong) on the gpu.

I can’t tell the overall accuracy of the ensemble model yet since i’m trying to make it work but I just keep getting out of memory errors.

I assume you load all your models and push it on different GPUs?
How big is each model?

Since you have 6 models and 4 GPUs, 2 GPUs will have 2 models on them?
You are probably running out of memory on these two?
If so, you could try to relocate the models after each operation, i.e.:

def forward(self, x):
    ...
    self.ensemble_models[0].cuda(0)
    log_softmax, softmax = self.ensemble_models[0](x_normalized)
    self.ensemble_models[0].cpu()
    ...

This will obviously slow down your ensemble, so you could try running it on CPU from the beginning.

I’m still interested in the final performance, since even though each model can predict its own class, the ensemble might fail when for example two models are predicting a very high probability.
I assume you’ve trained each model separately on its positive/negative samples.

I actually loaded all my models using DataParallel - so on all the my 4 GPUs.
Each model is ~12M parameters.

I tried to do the inference on the 6 models in the main training loop one after the other and collecting the scores into a tiny model that contains only 1 Linear layer and now my RAM is exploding while my GPUs RAM is releasing.

Hello, did you find a solution to this problem? I’m facing the exact same problem of RAM exploding, while GPU RAM does release