When working with Loss Functions, documentation states the size should be (N, *) where N is the batch size.
The question is, when working with nets which have output of size 1 (Regression Net) the output of the net has the size (N, 1) (Namely a matrix with one column).
Yet the ground truth data is usually a vector of length N.
Now, if one just subtract one from the other he will get a broadcasting and a result of (N, N) matrix:
vA = torch.rand((3, 1))
vB = torch.rand((3))
vA - vB
Do loss functions avoid this because of the use of lambda function?
Should one make sure the net output is a pure vector and not single column matrix?
Is it best practice to output return x.view(-1) at the end of a single output net?
It depends on the loss function you are using.
For a regression I would try nn.MSELoss, since you are dealing with float outputs.
The doc states that the shape for the input as well as the target has to be (N, *), which means no broadcasting is needed.
For a classification use case, the loss functions indeed need another shaped target.
This is because in the most simple use case the NLLLoss needs an input of [batch_size, class_log_probabilities] and a target of [batch_size], where each entry stores the target class index.
The net output is (N, 1) (Single Column Matrix) while the ground truth is (N) (Vector).
If you do subtraction between two PyTorch tensors of that form you get output matrix which is (N, N) -> Broadcasting happened.
Now, if you look at the code of the MSELoss (Under Functional) you see it uses lambda function.
I’m asking if the lambda function is the cause no broadcasting happens in the training phase?
Moreover I ask if it is a good practice to flat out the output of single parameter regression network by return x.view(-1).
I don’t know about your example, but this works (PyTorch 0.3.1):
from torch.autograd import Variable
N = 3
vA = Variable(torch.rand((N, 1)))
vB = Variable(torch.rand((N)))
F.mse_loss(vA, vB) #Try F.mse_loss(vA, vB, reduce = False)
I’m not sure what difference does it make to make those variables.
As the data size stays the same and I’d expect the code to be consistent (Variable or Tensor).
If you set the reduce parameter to reduce = False you will be able to see the output with size (N, 1).
Thanks for the info!
I just tried the lambda function and it’s not the reason for the reduction. So it seems torch._C._nn.mse_loss was responsible for the shape inference.
However, since the latest stable release is 0.4.0, I would suggest to update to this version.
@ptrblck,
Do things behave differently with regard to that on 0.4?
There are 2 things to do here:
Update documentation and set what behavior should happen (Also as advise for those who use Regression Nets to flatten their output in the case of single output).
Verify how come it works though it shouldnt.
I hope someone who is a committer will see this.
P. S.
0.4 is still no on Anaconda official channel.
Once it is I will update (Though probably wait few days to see if we have 0.4.1).