Two equivalent groups of parameters have very different file sizes when stored on .pth

Hi,

I have two groups of parameters (a test dataset and an extrapolation dataset, with the same number of elements) saved on two different .pth files which supposedly should have the same size. I’ve used the (this comment to compute the memory and they both return 73,400,648. My training instead returns 384,420,168. In particular I run:

sum ( p.nelement() * p.element_size() for p in extrapolation.parameters())

However the size files are very different (extrapolation 71M and test 417M), the latter being almost the same size as train (433M).
Train and test come from the same data distribution, but not that different from extrapolation; also not sure if data distribution (not tensor type and size) should be the primary cause of .pth file size.
Running for k in extrapolation: print(extrapolation[k].dtype) I’ve made sure both test and extrapolation only have torch.float32 tensors and when comparing tensors I get identical results for test and validation, with train differing in total number of points:

for k in test: print(extrapolation[k].shape, test[k].shape, train[k].shape)

torch.Size([2048, 100, 20]) torch.Size([2048, 100, 20]) torch.Size([10726, 100, 20])
torch.Size([2048, 99, 20]) torch.Size([2048, 99, 20]) torch.Size([10726, 99, 20])
torch.Size([2048, 100, 5]) torch.Size([2048, 100, 5]) torch.Size([10726, 100, 5])
torch.Size([1, 20]) torch.Size([1, 20]) torch.Size([1, 20])
torch.Size([1, 20]) torch.Size([1, 20]) torch.Size([1, 20])
torch.Size([1]) torch.Size([1]) torch.Size([1])
torch.Size([1, 20]) torch.Size([1, 20]) torch.Size([1, 20])
torch.Size([1, 20]) torch.Size([1, 20]) torch.Size([1, 20])
torch.Size([1]) torch.Size([1]) torch.Size([1])
torch.Size([2048, 100, 20]) torch.Size([2048, 100, 20]) torch.Size([10726, 100, 20])
torch.Size([2048, 99, 20]) torch.Size([2048, 99, 20]) torch.Size([10726, 99, 20])
torch.Size([2048, 100, 5]) torch.Size([2048, 100, 5]) torch.Size([10726, 100, 5])

What could be going on?

Thanks!

I’m not sure I understand the datasets correctly, so please feel free to correct me. :wink:

It seems you are dealing with “groups of parameters”, i.e. are these different state_dicts of different models?
Both return 73 million parameters or MB?
What does “My training instead returns 384,420,168.” mean? Does it mean that the number changes if you run it during training? How was it run before?

The last loop loops as if you are printing shapes of data samples from different datasets?

Yes, sorry I’m dealing with groups of parameters :slight_smile:

I’m trying to store train, test, test2 (extrapolation), so they’re not really models, more like groups of Tensors for each dataset (mostly X,y, some variations of X,y and statistics about them). They should have the same size and types except that train has 10,726 elements and test&extrapolation have 2,048 elements each.

Both test&extrapolation return 73 million parameters, but extrapolation file is 71MB and test is 417MB (sorry, I now see I put ‘M’ not ‘MB’ on the original post). Train has 384 million parameters and its file is 433MB.

The last loops are indeed printing the shape of each parameter in the data, since they all have parameters named the same it shows that test&extrapolation have the same size for all parameters and train has some parameters bigger on the first dimension (which makes sense).

Basically, the only thing that doesn’t make sense to me is the comparative size of the test file, as I would expect it to be the same as the extrapolation given that all the internal types and sizes are the same.

Thanks for your help,

Thanks for the information.
I cannot reproduce this issue using your shapes:

s1 = [torch.Size([2048, 100, 20]),
      torch.Size([2048, 99, 20]),
      torch.Size([2048, 100, 5]),
      torch.Size([1, 20]),
      torch.Size([1, 20]),
      torch.Size([1]),
      torch.Size([1, 20]),
      torch.Size([1, 20]),
      torch.Size([1]),
      torch.Size([2048, 100, 20]), 
      torch.Size([2048, 99, 20]),
      torch.Size([2048, 100, 5])]

x1 = [torch.randn(s) for s in s1]


s2 = [torch.Size([2048, 100, 20]),
      torch.Size([2048, 99, 20]),
      torch.Size([2048, 100, 5]),
      torch.Size([1, 20]),
      torch.Size([1, 20]),
      torch.Size([1]),
      torch.Size([1, 20]),
      torch.Size([1, 20]),
      torch.Size([1]),
      torch.Size([2048, 100, 20]),
      torch.Size([2048, 99, 20]),
      torch.Size([2048, 100, 5])]

x2 = [torch.randn(s) for s in s2]


s3 = [torch.Size([10726, 100, 20]),
      torch.Size([10726, 99, 20]),
      torch.Size([10726, 100, 5]),
      torch.Size([1, 20]),
      torch.Size([1, 20]),
      torch.Size([1]),
      torch.Size([1, 20]),
      torch.Size([1, 20]),
      torch.Size([1]),
      torch.Size([10726, 100, 20]),
      torch.Size([10726, 99, 20]),
      torch.Size([10726, 100, 5])]
    
x3 = [torch.randn(s) for s in s3]
    
torch.save(x1, 'x1.pt')
torch.save(x2, 'x2.pt')
torch.save(x3, 'x3.pt')

import os
print(os.path.getsize('x1.pt'))
> 73402073
print(os.path.getsize('x2.pt'))
> 73402073
print(os.path.getsize('x3.pt'))
> 384421593

Could you check my code for differences?

Rekindling this post. My colleague and I are having the same problem. I will try to do two things (1) Give you some more specific information and (2) come up with the simplest version I can to reproduce the behavior.

First the extra information for my case. If I understand the poster correctly, he has to lists of tensors of the same types and sizes. When he saves the list of tensors to a file, he gets very different file sizes. One would expect the file sizes to be the exact same if the lists contained the same size and dtype of tensors.

Our problem is the same except the tensors are identical! However, the way we are generating the tensors are not identical. We are working with wav files and our features are time slices of a STFT that have large overlapping portions. Our (X,y) pairs are both dtype float32 of sizes (1,41,128) and (1) respectively. Let’s ignore the y elements for a moment (I don’t think this is our issue). When we generate the data, we are taking overlapping portions of the STFT, So if the entire STFT of the wav file is a tensor D of size (1,1000,128) and XD is our list of X data elements, then XD[0] will be a slice of the first 41 out of 1000 columns of D starting at 0, so XD[0] = W[:,0:41,:] and the next element in the list XD[1] = W[:,1:42,:], etc… So in this example I have a python list of length 959=1000-41 long with tensors of size (1,41,128). If I were to clone each element of XD to a different list XD2 and save this, I get a file size that is very different. Sample code below:

import torch
W = torch.randn(torch.Size([1,10000,128])
XD = [W[:,idx:idx+41,:] for idx in range(10000-41)]
XD2 = [torch.clone(X) for X in XD]
torch.save(XD,‘xd.pt’)
torch.save(XD2,‘xd2.pt’)

import os
print(os.path.getsize(‘xd.pt’))

5895723

print(os.path.getsize(‘xd2.pt’))

211804978

Both of the python lists contain the same elements, so we are guessing it has something to do with the slicing, indexing or ordering of the elements and something that is going on under the covers. We thought order mattered in the list so we then did:

import random
random.shuffle(XD)

and

torch.save(XD,‘xdshuf.pt’)
print(os.path.getsize(‘xdshuf.pt’))

5895723

And now we are really perplexed! We came across this when trying to thread the data generation and had to send the data over a pipe thread. We have since managed to get the file with the smaller size by using torch’s byte_io method, but we are wondering what exactly is going on here. Any info much appreciated - and we love pytorch by the way :slight_smile:

Your issue sounds a bit similar to this post. In any case, if you are still seeing this unexpected behavior in the latest nightly binary or master build, could you create an issue on GitHub with a minimal, executable code snippet, please?

Thanks for the quick reply. I read the referenced post and I don’t believe it is related to that. Also, I may not have been clear in what I explained, but it is not necessarily a bug, more just something that pytorch is doing under the covers that we don’t understand. To reproduce what we are seeing, here is the code snippett:

import torch
W = torch.randn(torch.Size([1,10000,128])
XD = [W[:,idx:idx+41,:] for idx in range(10000-41)]
XD2 = [torch.clone(X) for X in XD]
torch.save(XD,‘xd.pt’)
torch.save(XD2,‘xd2.pt’)

when you compare the lists XD and XD2 they are exactly the same, but when you compare the .pt files that were saved off, xd2.pt is 35 times bigger than xd.pt. 211804978 vs. 5895723.

Ah I see. Based on this code snippet XD is storing the views of the slicing operation while you are explicitly cloning the data in XD2 and thus trigger a copy.

Have a look at:

# inplace op on view
x = torch.zeros(5)
xd = x[1:2]
xd.fill_(1.)

print(x)
# tensor([0., 1., 0., 0., 0.])
print(xd)
# tensor([1.])

# clone
x = torch.zeros(5)
xd = torch.clone(x[1:2])
xd.fill_(1.)

print(x)
# tensor([0., 0., 0., 0., 0.])
print(xd)
# tensor([1.])

Here you can also see that the first approach is able to manipulate the original x tensor, while the second works on a copy, so I think the size increase might be expected in your posted code snippet.