I understand the back propagation. But I am just confused about the case where batch size is greater than 1. let’s say I have broken down the dataset as batch_size = 32. So my outputs will be in length of 32 (outputs = model(sequence)). However, when I apply loss = criterion(outputs, labels), this loss is a float number instead of 32 losses and then I apply loss.backward(). So I am treating the whole batch as a whole, where this affect my model’s generalization. Like I do not want to related the data file in one batch.
Hi @L_Z,
When you call loss = criterion(outputs, labels)
you’re implicitly summing over all your data. If you want all 32 losses, you can change the reduction method on the loss to None
. In the case of the cross-entropy loss function (docs below), you can see the reduction
kwarg, which specifies how to reduce your data to an individual loss value. If you want all individual loss values (across the entire batch), then pass reduction = 'None'
, when you instantiate the loss function object.
https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html
Thank you for getting back to my questions. I am not sure what you mean by saying summing over all my data. I am just wondering, in general case, where we want to treat the training data separately, it is fine for as to summing over all data? Or another way to solve this is to let batch_size = 1? Will that also work?
Let’s take the MSELoss (docs below) function as the example (as you can’t have reduction='none'
for the cross-entropy function,
criterion = torch.nn.MSELoss(size_average=None,
reduce=None,
reduction='mean')
As can be seen, the default option for reduction
is mean
, so it will return the loss
as the average over all individual loss values. If you set this to 'none'
, it’ll return all individual loss values.
https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#mseloss
For example,
import torch
criterion = torch.nn.MSELoss() #defaults to reduction='mean'
criterion_ps = torch.nn.MSELoss(reduction='none')
x=torch.randn(4,)
y=torch.randn(4,)
criterion(x,y) #returns tensor(0.6103)
criterion_ps(x,y) #returns tensor([0.3331, 1.2524, 0.0025, 0.8534])
If we take the mean of the criterion_ps
function, we get the same results as criterion
, i.e. criterion_ps(x,y).mean()
returns tensor(0.6103)