Possible reasons for regression problem with almost same prediction result

Hong_Cheng · January 14, 2020, 6:10am

I have a model that predict four similar category numerical data (target values).

I use a CNN to extract feature for each category target value, before FC layers, I also add two values to the flattened extracted feature maps.

This works in Keras. I rewrite my codes with Pytorch. But the result of prediction for each category are almost same, with a little fluctuation.

I worry that this is caused by the back propagation.
The loss my model is RMSE of four target values of a training batch.

Any ideas?

albanD · January 14, 2020, 3:54pm

Hi,

This does not sounds like something you should not do.
Could you share some code? In particular, how do your define your model. And how do you add the extra values before the FC?

Hong_Cheng · January 16, 2020, 7:09am

concatenate the extra value with the feature map

Hong_Cheng · January 16, 2020, 8:07am

Here is my model output

		out = torch.cat((x1, x2, x3, x4), 1)
		# print('after_concat', x1.shape)  # (batch_size,4)

Here is my loss function

def RMSELoss_new(yhat, y):
	return torch.mean(torch.sqrt(torch.mean((yhat - y) ** 2, 0)))
criterion = RMSELoss_new

And I use SGD optimizer and in the main training loop, I calculate the loss and optimize the model.

			optimizer.zero_grad()

			outputs = net(inputs, tr_angles)   # (batch_size,4)

			# loss = torch.sqrt(criterion(outputs, labels))
			loss = criterion(outputs, labels)
			# print(loss)  # tensor(107.8965, device='cuda:0', grad_fn=<SqrtBackward>)

			loss.backward()
			optimizer.step()

Hong_Cheng · January 16, 2020, 8:09am

Here is my understanding:

for i in range(1,epoch+1):
  model.train()
  optimizer.zero_grad()
  outputs = model(inputs)
  loss = criterion(outputs, labels)
  loss.backward()
  optimizer.step()
  if val_print_show_time == i:
     with torch.no_grad():
        model.eval()
        val_outputs = model(val_inputs)
        # early stoping ......

I think I may make a mistake that I should move the model.eval() outside of the with torch.no_grad():

  if val_print_show_time == i:
     model.eval()
     with torch.no_grad():
        val_outputs = model(val_inputs)
        # early stoping ......

But my model’s performance is not good.

For the regression, especially for this kind of multi-value prediction problem, is there any better loss function or optimizer than RMSE and SGD(or Adam)?

OTHER DETAILS

Here is my model output

		out = torch.cat((x1, x2, x3, x4), 1)
		# print('after_concat', x1.shape)  # (batch_size,4)

Here is my loss function

def RMSELoss_new(yhat, y):
	return torch.mean(torch.sqrt(torch.mean((yhat - y) ** 2, 0)))
criterion = RMSELoss_new

And I use SGD optimizer and in the main training loop, I calculate the loss and optimize the model.

			optimizer.zero_grad()

			outputs = net(inputs, tr_angles)   # (batch_size,4)

			# loss = torch.sqrt(criterion(outputs, labels))
			loss = criterion(outputs, labels)
			# print(loss)  # tensor(107.8965, device='cuda:0', grad_fn=<SqrtBackward>)

			loss.backward()
			optimizer.step()

albanD · January 16, 2020, 2:55pm

All these look good.

Moving the .eval outside of the no_grad() block won’t change anything.
If you use batchnorm, you might want to double check how the model behaves without the model.eval(). The saved statistics for the eval mode might not be very good in the middle of the training when the model changes a lot.
You can check that the network initialization is what you expect. That could explain the difference with Keras.

Hong_Cheng · January 17, 2020, 1:10am

If we don’t use validation dataset do the evaluation, I am not sure if my model learns or not.
In this case, how can we know the learning progress during the training? or this kind of evaluation is for debug?

Anyway, I will take your suggestions

albanD · January 17, 2020, 9:10pm

You can also do evaluation with large batch-size, with the network in training mode. That might help with getting a better idea while the weights are not stable.

Hong_Cheng · January 18, 2020, 1:41am

you mean I can evaluate the model without using the model.eval().
Is my understanding right? Thanks a lot!

Actually, I use one-batch validation dataset

albanD · January 18, 2020, 5:45am

.eval() changes the behavior of some modules. For example, Batchnorm uses the saved statistics instead of the current batch’s ones, dropout becomes an identity, etc