RecursionError while saving a model's state dict

Jordan_Juras · March 9, 2020, 1:49pm

Hi,

I am having an ongoing issue with saving the ‘state_dict’ of my model/optimiser. I have followed the guidance from this forum, which encourages saving the state dictionaries of the model and optimiser separately. I unfortunately cannot do this, as the action produces a python recursion error - which states that the limit is reached. If I increase the limit with sys.setrecursionlimit, I get a segfault.

I have worked around this by simply saving the model and the optimiser in their entirety. However, I have updated some functionality in the model for visualisation and testing, and I need to transfer the state dict of the legacy model to the new model to use the functionality.

Has anyone else ran into this issue?

The model is a convolution vae with 4 layers separating the latent space from each of the input and output. The input dimension is [N_batch, 1, 33075], but kernel sizes do not exceed 5 samples. Latent space dimension is on the order of [1, hundreds].

As far as I can tell this is a standard size model (?). Training was not problematic in time.

Any help would be greatly appreciated.

ptrblck · March 10, 2020, 4:35am

You will get a max recursion error, if you have assigned the model to itself somewhere.
This code demonstrates the issue:


class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.fc1 = nn.Linear(1, 1)
        self.model = self # will create recursion error
        
    def forward(self, x):
        x = self.fc1(x)
        return x

model = MyModel()
model.state_dict()

If you remove the self.model = self assignment, the state_dict can be created.
Could you check your model for such a loop?