Parameters are different after loading model

magic282 · December 22, 2017, 8:02am

Hi,

Recently I am working on a summarization project. During training, I saved the best model on the development set. However, loading the best model and testing again on the dev set gives me different ROUGE result (0.18218091939853281 -> 0.18217045231619222 ). Although the difference is small, it raises much concerns. And my colleagues told me that they have also encountered this issue (they observed about 2 points drop on their QA task). So I wrote a small script, and found that something is indeed different after saving and loading.

And I also found this Saving and loading a model in Pytorch?

The code on github gist runs on Windows (peterjc123’s 0.3.0 build). On linux (also 0.3.0 , I built a docker image myself by installing pytorch through conda) it raises an error that I don’t understand why:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorCopy.cu line=204 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "test.py", line 120, in <module>
    main()
  File "test.py", line 76, in main
    hidden = hidden.transpose(0, 1).contiguous().view(hidden.size(1), -1)
  File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/autograd/variable.py", line 280, in contiguous
    self.data = self.data.contiguous()
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorCopy.cu:204

magic282 · December 22, 2017, 8:22am

P.S.
My colleague says, when the pytorch was 0.2.x, he can solve this problem by removing the flatten_parameters function in RNN. So he suspects that something in RNN caused this.