Different network output with using batch or not

I train my model with batch size 128, however if I don’t use batch in the evaluation phase, the network output is wrong.

If the network’s input is in batches:

crit = nn.MSELoss(reduction='mean')
target = []
netout = []

model.eval() # To handle drop out layers and batch norm
for A, M, label in dataloaders['val']:
    with tr.set_grad_enabled(False): # We don't need gradient computation in eval mode (speed up)
        out = model(A, M)
    
        target.extend(label.data.cpu().numpy())
        netout.extend(out.data.cpu().numpy())

print(tr.Tensor(target).shape, tr.Tensor(netout).shape)
print(f'Loss: {crit(tr.Tensor(target), tr.Tensor(netout))}')

Result:

torch.Size([849]) torch.Size([849, 1])
Loss: 1.135872483253479

If I iterate through the single elements, I attain different result:

crit = nn.MSELoss(reduction='mean')
target = []
netout = []

model.eval() # To handle drop out layers and batch norm
 for A, M, label in testset:
     with tr.set_grad_enabled(False): # We don't need gradient computation in eval mode (speed up)
         out = model(tr.unsqueeze(A, 0), tr.unsqueeze(M, 0))
        
     target.append(label.item())
     netout.append(out.item())
    

print(tr.Tensor(target).shape, tr.Tensor(netout).shape)
print(f'Loss: {crit(tr.Tensor(target), tr.Tensor(netout))}')

Result

torch.Size([849]) torch.Size([849])
Loss: 87.88565063476562
upload://d8EOWbBPkZovUeiMrUWm4aNWUze.png

But the result should be exactly the same, because the output is deterministic!
What could be the problem? I can overfit the model for a specific batch size, but if I evaluate on single elements, then the result is wrong.

Could you print the shape of tr.Tensor(target) and tr.Tensor(netout) before passing them to the criterion?
Also, which criterion are you using at the moment?

crit = nn.MSELoss(reduction='mean')
For batch input:
torch.Size([849]) torch.Size([849, 1])

For single input:
torch.Size([849]) torch.Size([849])

I edited the post to contain these information.

It seems you might be accidentally broadcasting the inputs to your criterion.
In the latest PyTorch version (1.2.0) you should get a warning.
Make sure to pass the input and target as [batch_size, 1] or [batch_size] (not mixed).

Yes, but if a correct this (with squeeze), I get a correct loss:
Loss: 0.8491315245628357

But the issue remains – the network output differs when batched or single input.

That shouldn’t be the case.
Could you post the model architecture so that we could have a look?

I have already spent a lot of time on this problem. My criterion CrossEntropyLoss. The input is [batch, num_class], the target is [batch]. Everything works well on batch training and testing. On a single package does not work. I think that NN sees the data inside the batch. We have to train the model with the size of the batch = 1.

If you have a reproducible code snippet, we could look into it. :slight_smile:

I think, I found the bug:

        _, _, H, W = A.shape
        mask2 = tr.div(mask2*H*W, 2*tr.norm(mask2, p=1))

Here I calculate the norm overall what is wrong!
I’ll try to correct this – calculate the norm of the mask for each batch…

This was the problem! Now working with this:

        B, _, H, W = A.shape
        norm =  2*tr.norm(mask1, p=1, dim=(1,2,3))
        norm = norm.reshape(B, 1, 1, 1)
        mask1 = tr.div(mask1*H*W, norm)

Thank you for your help!

I’m glad it’s working now! :slight_smile:

1 Like