I understand the back propagation. But I am just confused about the case where batch size is greater than 1. let’s say I have broken down the dataset as batch_size = 32. So my outputs will be in length of 32 (outputs = model(sequence)). However, when I apply loss = criterion(outputs, labels), this loss is a float number instead of 32 losses and then I apply loss.backward(). So I am treating the whole batch as a whole, where this affect my model’s generalization. Like I do not want to related the data file in one batch.

Hi @L_Z,

When you call `loss = criterion(outputs, labels)`

you’re implicitly summing over all your data. If you want all 32 losses, you can change the reduction method on the loss to `None`

. In the case of the cross-entropy loss function (docs below), you can see the `reduction`

kwarg, which specifies how to reduce your data to an individual loss value. If you want all individual loss values (across the entire batch), then pass `reduction = 'None'`

, when you instantiate the loss function object.

https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

Thank you for getting back to my questions. I am not sure what you mean by saying summing over all my data. I am just wondering, in general case, where we want to treat the training data separately, it is fine for as to summing over all data? Or another way to solve this is to let batch_size = 1? Will that also work?

Let’s take the MSELoss (docs below) function as the example (as you can’t have `reduction='none'`

for the cross-entropy function,

```
criterion = torch.nn.MSELoss(size_average=None,
reduce=None,
reduction='mean')
```

As can be seen, the default option for `reduction`

is `mean`

, so it will return the `loss`

as the average over all individual loss values. If you set this to `'none'`

, it’ll return all individual loss values.

https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#mseloss

For example,

```
import torch
criterion = torch.nn.MSELoss() #defaults to reduction='mean'
criterion_ps = torch.nn.MSELoss(reduction='none')
x=torch.randn(4,)
y=torch.randn(4,)
criterion(x,y) #returns tensor(0.6103)
criterion_ps(x,y) #returns tensor([0.3331, 1.2524, 0.0025, 0.8534])
```

If we take the mean of the `criterion_ps`

function, we get the same results as `criterion`

, i.e. `criterion_ps(x,y).mean()`

returns `tensor(0.6103)`