Hi,

I have two groups of parameters (a test dataset and an extrapolation dataset, with the same number of elements) saved on two different .pth files which supposedly should have the same size. I’ve used the (this comment to compute the memory and they both return 73,400,648. My training instead returns 384,420,168. In particular I run:

```
sum ( p.nelement() * p.element_size() for p in extrapolation.parameters())
```

However the size files are very different (extrapolation 71M and test 417M), the latter being almost the same size as train (433M).

Train and test come from the same data distribution, but not that different from extrapolation; also not sure if data distribution (not tensor type and size) should be the primary cause of .pth file size.

Running `for k in extrapolation: print(extrapolation[k].dtype)`

I’ve made sure both test and extrapolation only have torch.float32 tensors and when comparing tensors I get identical results for test and validation, with train differing in total number of points:

```
for k in test: print(extrapolation[k].shape, test[k].shape, train[k].shape)
torch.Size([2048, 100, 20]) torch.Size([2048, 100, 20]) torch.Size([10726, 100, 20])
torch.Size([2048, 99, 20]) torch.Size([2048, 99, 20]) torch.Size([10726, 99, 20])
torch.Size([2048, 100, 5]) torch.Size([2048, 100, 5]) torch.Size([10726, 100, 5])
torch.Size([1, 20]) torch.Size([1, 20]) torch.Size([1, 20])
torch.Size([1, 20]) torch.Size([1, 20]) torch.Size([1, 20])
torch.Size([1]) torch.Size([1]) torch.Size([1])
torch.Size([1, 20]) torch.Size([1, 20]) torch.Size([1, 20])
torch.Size([1, 20]) torch.Size([1, 20]) torch.Size([1, 20])
torch.Size([1]) torch.Size([1]) torch.Size([1])
torch.Size([2048, 100, 20]) torch.Size([2048, 100, 20]) torch.Size([10726, 100, 20])
torch.Size([2048, 99, 20]) torch.Size([2048, 99, 20]) torch.Size([10726, 99, 20])
torch.Size([2048, 100, 5]) torch.Size([2048, 100, 5]) torch.Size([10726, 100, 5])
```

What could be going on?

Thanks!