Different network output with using batch or not

Terbe_Daniel · September 25, 2019, 9:30am

I train my model with batch size 128, however if I don’t use batch in the evaluation phase, the network output is wrong.

If the network’s input is in batches:

crit = nn.MSELoss(reduction='mean')
target = []
netout = []

model.eval() # To handle drop out layers and batch norm
for A, M, label in dataloaders['val']:
    with tr.set_grad_enabled(False): # We don't need gradient computation in eval mode (speed up)
        out = model(A, M)
    
        target.extend(label.data.cpu().numpy())
        netout.extend(out.data.cpu().numpy())

print(tr.Tensor(target).shape, tr.Tensor(netout).shape)
print(f'Loss: {crit(tr.Tensor(target), tr.Tensor(netout))}')

Result:

torch.Size([849]) torch.Size([849, 1])
Loss: 1.135872483253479

batch_input708×376 82.5 KB

If I iterate through the single elements, I attain different result:

crit = nn.MSELoss(reduction='mean')
target = []
netout = []

model.eval() # To handle drop out layers and batch norm
 for A, M, label in testset:
     with tr.set_grad_enabled(False): # We don't need gradient computation in eval mode (speed up)
         out = model(tr.unsqueeze(A, 0), tr.unsqueeze(M, 0))
        
     target.append(label.item())
     netout.append(out.item())
    

print(tr.Tensor(target).shape, tr.Tensor(netout).shape)
print(f'Loss: {crit(tr.Tensor(target), tr.Tensor(netout))}')

Result

torch.Size([849]) torch.Size([849])
Loss: 87.88565063476562
upload://d8EOWbBPkZovUeiMrUWm4aNWUze.png

But the result should be exactly the same, because the output is deterministic!
What could be the problem? I can overfit the model for a specific batch size, but if I evaluate on single elements, then the result is wrong.

ptrblck · September 25, 2019, 12:26pm

Could you print the shape of tr.Tensor(target) and tr.Tensor(netout) before passing them to the criterion?
Also, which criterion are you using at the moment?

Terbe_Daniel · September 25, 2019, 1:59pm

crit = nn.MSELoss(reduction='mean')
For batch input:
torch.Size([849]) torch.Size([849, 1])

For single input:
torch.Size([849]) torch.Size([849])

I edited the post to contain these information.

ptrblck · September 25, 2019, 2:01pm

It seems you might be accidentally broadcasting the inputs to your criterion.
In the latest PyTorch version (1.2.0) you should get a warning.
Make sure to pass the input and target as [batch_size, 1] or [batch_size] (not mixed).

Terbe_Daniel · September 25, 2019, 2:03pm

Yes, but if a correct this (with squeeze), I get a correct loss:
Loss: 0.8491315245628357

But the issue remains – the network output differs when batched or single input.

ptrblck · September 25, 2019, 2:04pm

That shouldn’t be the case.
Could you post the model architecture so that we could have a look?

slavavs · September 25, 2019, 2:18pm

I have already spent a lot of time on this problem. My criterion CrossEntropyLoss. The input is [batch, num_class], the target is [batch]. Everything works well on batch training and testing. On a single package does not work. I think that NN sees the data inside the batch. We have to train the model with the size of the batch = 1.

ptrblck · September 25, 2019, 2:25pm

If you have a reproducible code snippet, we could look into it.

Terbe_Daniel · September 25, 2019, 3:41pm

I think, I found the bug:

        _, _, H, W = A.shape
        mask2 = tr.div(mask2*H*W, 2*tr.norm(mask2, p=1))

Here I calculate the norm overall what is wrong!
I’ll try to correct this – calculate the norm of the mask for each batch…

Terbe_Daniel · September 25, 2019, 4:23pm

This was the problem! Now working with this:

        B, _, H, W = A.shape
        norm =  2*tr.norm(mask1, p=1, dim=(1,2,3))
        norm = norm.reshape(B, 1, 1, 1)
        mask1 = tr.div(mask1*H*W, norm)

Thank you for your help!

ptrblck · September 25, 2019, 4:24pm

I’m glad it’s working now!