When a model is imported (via torch.load) into a training file it does not learn, but when initialized in same file does

MicBlizzard · September 26, 2022, 8:28am

Hopefully the title makes sense (I am relatively new to python and pytorch).

Basically, I have two python files that I use to create a model (a nn.Module sub-class), which I then fully save (not the dictionary of parameters, but the entire model) using the torch.save(model,PATH_OF_PTH) method in a pth file. I then load the model using torch.load(PATH_OF_PTH) in my train.py file and do the usual training (using nn.CrossEntropyLoss() as my loss and optim.SGD(model.parameters(), lr=0.001, momentum=0.9) as my optimizer). With this set-up my loss decreases very slowly and the model barely learns.

However, if I copy paste my model generating code into my training file and do not save and load the model, but directly train it - the training is a lot more efficient and the loss decreases a lot faster.

Does anyone know what could be the cause of this?

Thank you in advance!

ptrblck · September 26, 2022, 4:04pm

Using torch.save and torch.load would load the entire model and all its dependencies. I don’t know which methods are used but are some maybe overridden? I would probably try to compare the outputs of both approaches using static inputs (e.g. torch.ones) and in model.eval() mode to check if the model itself is returning different values (maybe also different parameter sets are used).