HI, I’m wondering about batch size, which definitely influences the epoch-loss graph. If we suppose we have the following tensors as the predictions of the model and real values, *the result of loss would be different due to the shifting of elements between the batches.* (values are random and not normalized)

```
loss = nn.MSELoss()
predicted_batch_1 = torch.tensor([40., -1,-0.7,1]).float()
ground_truth_batch_1 = torch.tensor([1.1,180,15,1]).float()
predicted_batch_2 = torch.tensor([1,89.5]).float()
ground_truth_batch_2 = torch.tensor([1,14]).float()
output1 = loss(predicted_batch_1 , ground_truth_batch_1 ) # is 8630.1748
output2 = loss(predicted_batch_2 , ground_truth_batch_2 ) # is 2850.1250
total loss= output1 +output2 # 11480.2998
```

if we concatenate the predicted tensors and also do concatenation for ground truths. then the result is 6703.4917

**HEREIN LIES THE PROBLEM** :: if we swap one element in both predicted and ground truth from one batch to another (like the following snippet ), the result would be different and we can’t easily use the mean of the total loss for plotting EPOCH_LOSS since the mean of the above batches is different from the one below.

```
predicted_batch_1 = torch.tensor([40., -1,-0.7,89.5]).float()
ground_truth_batch_1 = torch.tensor([1.1,180,15,14]).float()
predicted_batch_2 = torch.tensor([1,1]).float()
ground_truth_batch_2 = torch.tensor([1,1]).float()
output1 = loss(predicted_batch_1 , ground_truth_batch_1 ) # is 10055.2373
output2 = loss(predicted_batch_2 , ground_truth_batch_2 ) # is 0.0
total loss= output1 +output2 # 10055.2373
```

I’m looking forward to your valuable comments.