Hi,
I am having an ongoing issue with saving the ‘state_dict’ of my model/optimiser. I have followed the guidance from this forum, which encourages saving the state dictionaries of the model and optimiser separately. I unfortunately cannot do this, as the action produces a python recursion error - which states that the limit is reached. If I increase the limit with sys.setrecursionlimit, I get a segfault.
I have worked around this by simply saving the model and the optimiser in their entirety. However, I have updated some functionality in the model for visualisation and testing, and I need to transfer the state dict of the legacy model to the new model to use the functionality.
Has anyone else ran into this issue?
The model is a convolution vae with 4 layers separating the latent space from each of the input and output. The input dimension is [N_batch, 1, 33075], but kernel sizes do not exceed 5 samples. Latent space dimension is on the order of [1, hundreds].
As far as I can tell this is a standard size model (?). Training was not problematic in time.
Any help would be greatly appreciated.