Loss Value problem

I applied LSTM as input CNN feaures. I observed until 10 epochs but the loss value does not decrease.It is usually around 1.60(loss value).
I did not find where is problem.Please help me .
I have to set 1 to mini-batch size ,because of

cuda runtime error (2) : out of memory

I have 5 classes and total number of video is 300.The videos are different length.And max length is 600.
Lr : 1e-4
Lstm-layers : 1
Hidden size: 512
Fc-size: 1024

I scaled down hidden size to 64, but result there is no change.

def __init__(self, original_model, arch, num_classes, lstm_layers, hidden_size, fc_size):
	super(LSTMModel, self).__init__()
	self.hidden_size = hidden_size
	self.num_classes = num_classes
	self.fc_size = fc_size
	if arch.startswith('resnet50'):
		self.features = nn.Sequential(*list(original_model.children())[:-1])
		for i, param in enumerate(self.features.parameters()):
			param.requires_grad = False
		self.fc_pre = nn.Sequential(nn.Linear(2048, fc_size), nn.Dropout())
		self.rnn = nn.LSTM(input_size = fc_size,
					hidden_size = hidden_size,
					num_layers = lstm_layers,
					batch_first = True)
		self.fc = nn.Linear(hidden_size, num_classes)

		raise Exception("This architecture has not been supported yet")

def init_hidden(self, num_layers, batch_size):

	return (torch.zeros(num_layers, batch_size, self.hidden_size).cuda(),
			torch.zeros(num_layers, batch_size, self.hidden_size).cuda())

def forward(self, inputs, hidden=None, steps=0):
	length = len(inputs)
	fs = torch.zeros(inputs[0].size(0), length, self.rnn.input_size).cuda()

	for i in range(length):
		f = self.features(inputs[i])
		f = f.view(f.size(0), -1)
		f = self.fc_pre(f)
		fs[:, i, :] = f

	outputs, hidden = self.rnn(fs, hidden)
	outputs = self.fc(outputs)
	return outputs

Can you help me @albanD , @ptrblck ?


It is very hard to say I am afraid as we don’t have any way to reproduce the issue or see the code.

Things that you want to check are:

  • Are gradients flowing properly? Does the params in your net have .grad fields populated with non-zero values?
  • Is the optimizer setup properly? Does the params it has are the same as the ones where gradients are populated?

I tested model with HMDB51 dataset, and loss variable is decreasing.So the model is ok.
Minimum number of frame is 18 ,maximum number of frame 1062 and average number of frame is 92 in HMDB51.
In my dataset, maximum frame is 1115, min. frame is 550.
So,when I compare, my own dataset has bigger number of frame.
I changed the parameters but I could not reduce the loss value in my dataset, can you suggest a method according to the information that the average number of frames is high.