Loss Functions Sensitivity to Input Data Size

Royi · May 4, 2018, 9:50am

When working with Loss Functions, documentation states the size should be (N, *) where N is the batch size.

The question is, when working with nets which have output of size 1 (Regression Net) the output of the net has the size (N, 1) (Namely a matrix with one column).
Yet the ground truth data is usually a vector of length N.
Now, if one just subtract one from the other he will get a broadcasting and a result of (N, N) matrix:

vA = torch.rand((3, 1))
vB = torch.rand((3))
vA - vB

Do loss functions avoid this because of the use of lambda function?
Should one make sure the net output is a pure vector and not single column matrix?
Is it best practice to output return x.view(-1) at the end of a single output net?

ptrblck · May 4, 2018, 12:08pm

It depends on the loss function you are using.
For a regression I would try nn.MSELoss, since you are dealing with float outputs.
The doc states that the shape for the input as well as the target has to be (N, *), which means no broadcasting is needed.

For a classification use case, the loss functions indeed need another shaped target.
This is because in the most simple use case the NLLLoss needs an input of [batch_size, class_log_probabilities] and a target of [batch_size], where each entry stores the target class index.

Royi · May 4, 2018, 1:07pm

@ptrblck,
I guess I didn’t explain myself.

The net output is (N, 1) (Single Column Matrix) while the ground truth is (N) (Vector).
If you do subtraction between two PyTorch tensors of that form you get output matrix which is (N, N) -> Broadcasting happened.

Now, if you look at the code of the MSELoss (Under Functional) you see it uses lambda function.
I’m asking if the lambda function is the cause no broadcasting happens in the training phase?

Moreover I ask if it is a good practice to flat out the output of single parameter regression network by return x.view(-1).

ptrblck · May 4, 2018, 1:49pm

If you have a (N, 1) tensor and another (N), MSELoss won’t work, since it requires both tensors to have the same shape.

You would use .view(-1) on the input tensor or .unsqueeze_(1) on the target tensor.

N = 10
a = torch.randn(N, 1)
b = torch.randn(N)

criterion = nn.MSELoss()
loss = criterion(a, b)  # RuntimeError
loss = criterion(a, b.unsqueeze_(1))

Royi · May 4, 2018, 2:07pm

I don’t know about your example, but this works (PyTorch 0.3.1):

from torch.autograd import Variable
N = 3
vA = Variable(torch.rand((N, 1)))
vB = Variable(torch.rand((N)))
F.mse_loss(vA, vB) #Try F.mse_loss(vA, vB, reduce = False)

I’m not sure what difference does it make to make those variables.
As the data size stays the same and I’d expect the code to be consistent (Variable or Tensor).

If you set the reduce parameter to reduce = False you will be able to see the output with size (N, 1).

ptrblck · May 4, 2018, 11:24pm

Thanks for the info!
I just tried the lambda function and it’s not the reason for the reduction. So it seems torch._C._nn.mse_loss was responsible for the shape inference.
However, since the latest stable release is 0.4.0, I would suggest to update to this version.

Royi · May 4, 2018, 11:46pm

@ptrblck,
Do things behave differently with regard to that on 0.4?

There are 2 things to do here:

Update documentation and set what behavior should happen (Also as advise for those who use Regression Nets to flatten their output in the case of single output).
Verify how come it works though it shouldnt.

I hope someone who is a committer will see this.

P. S.
0.4 is still no on Anaconda official channel.
Once it is I will update (Though probably wait few days to see if we have 0.4.1).

ptrblck · May 5, 2018, 12:00am

As shown in my first example, it will return a RuntimeError.
The shapes are documented in the docs.
So I cannot see an error here.

PyTorch 0.4.0 is officially released and you can install it using conda or pip.
Have a look at the website for the install instructions.

Royi · May 5, 2018, 6:14am

@ptrblck,
The documentation for 0.3.1 says the shape needs to be the same -> It shouldn’t work but it does.

I would also suggest that when the last layer output is single output PyTorch output will be a vector and not a single column matrix.

PyTorch 0.4.0 is released on PyTorch channel not on Anaconda’s official channel.
I always ask them: https://github.com/ContinuumIO/anaconda-issues/issues/7410

Hopefully they will update it soon.