Inconsistent results when load the weights from JIT ScriptModule to nn.Module

Bingchen_Liu · September 24, 2019, 1:55am

Hi

jit_net = torch.jit.load(saved_path)   # load the pre-trained network defined as a ScriptModule
nn_net = TheSameNet()    # this is the same network as jit_net but defined as a nn.Module
nn_net.load_state_dict(jit_net.state_dict())

I have a pre-trained ScriptModule and now I want to change some forward-pass function of it, therefore I define an exactly same nn.Module class and want to load the weights into this new network. I do this by the above code.
However, when I successfully load the pre-trained weights in the ScriptModule model to the nn.Module model, the “nn_net” outputs different results as the “jit_net” for the same inputs. I would like to know if there is a proper way to transfer the weights between jit-scriptmodule and nn.module, or do I miss something, or if it is potentially a bug for the inconsistent output, or is it just not recommened to do so (transfering weights between ScriptModule and nn.Module).

joe8086 · September 27, 2019, 6:08am

I also meet this kind inconsistent before.
So I give up use ScriptModel, instead I build a parser to parse ScriptModel.graph IR.
Then use the IR label to access state_dict
But the IR is not public release, it still not a good idea to handle this issue (because IR may change between different version)
I think this is a pytorch bug, need pytorch to fix it.

Michael_Suo · September 27, 2019, 6:40am

This is not expected, no, but could happen if there is a source of non-determinism in the model. If you could file a Github issue with a repro, that would be very helpful. Ideally a simple model that we can run to produce an inconsistency.