Embeddings doing wierd things with gradients

DuaneNielsen · April 13, 2018, 2:46am

I’m grad-checking a model, and seem to have traced a broken gradient to the embeddings I was plugging in.

Here is a minimal example of my code… any thoughts welcome.

import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
from torch.autograd.gradcheck import gradcheck
from torch.autograd import Variable
import torch

    def testGradCheckEmbeddingBasic(self):
        seqs = ['ghatmasala', 'nicela', 'c-pakodas']
        e = nn.Embedding(10, 3, sparse=False).double()
        indices = Variable(torch.LongTensor([[1], [4]]))
        embed = e(indices)
        print(embed)
        input = (embed, )
        model = nn.Linear(3, 3).double()
        test = gradcheck(model, input, eps=1e-6, atol=1e-4)

Error I get is…

Variable containing:
(0 ,.,.) =
-1.7624 0.1646 -0.5719

(1 ,.,.) =
-0.5188 -0.2282 1.3176
[torch.DoubleTensor of size 2x1x3]

Ran 1 test in 0.232s

FAILED (errors=1)

Error
Traceback (most recent call last):
File “C:\Users\ZEBEAST\Anaconda3\envs\pytorch\lib\unittest\case.py”, line 59, in testPartExecutor
yield
File “C:\Users\ZEBEAST\Anaconda3\envs\pytorch\lib\unittest\case.py”, line 605, in run
testMethod()
File “C:\Users\ZEBEAST\PycharmProjects\sauron\tests\test_characterEmbedding.py”, line 51, in testGradCheckEmbeddingBasic
test = gradcheck(model, input, eps=1e-6, atol=1e-4)
File “C:\Users\ZEBEAST\Anaconda3\envs\pytorch\lib\site-packages\torch\autograd\gradcheck.py”, line 181, in gradcheck
return fail_test(‘for output no. %d,\n numerical:%s\nanalytical:%s\n’ % (j, numerical, analytical))
File “C:\Users\ZEBEAST\Anaconda3\envs\pytorch\lib\site-packages\torch\autograd\gradcheck.py”, line 166, in fail_test
raise RuntimeError(msg)
RuntimeError: for output no. 0,
numerical:(
0.3081 -0.1945 -0.0135 0.0000 0.0000 0.0000
-0.0716 0.3882 0.2380 0.0000 0.0000 0.0000
0.3901 -0.1542 0.5559 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.3081 -0.1945 -0.0135
0.0000 0.0000 0.0000 -0.0716 0.3882 0.2380
0.0000 0.0000 0.0000 0.3901 -0.1542 0.5559
[torch.FloatTensor of size 6x6]
,)
analytical:(
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
[torch.FloatTensor of size 6x6]
,)

albanD · April 13, 2018, 10:00am

Hi,

The thing is that the input that you give to the gradcheck should be a Variable for which you want gradients: either a leaf Variable with requires_grad=True or call .retain_grad() in the input Variable before giving it to gradcheck so that it’s gradients are computed.

DuaneNielsen · April 13, 2018, 5:42pm

Hi Alban, thanks for the reply…

Here’s a screenshot of the inputted “embed” variable from the debugger…

It says that the embedding Variable requires a gradient. Is that what you mean?

DuaneNielsen · April 13, 2018, 5:51pm

Adding retain_grad() fixes it though! Kinda odd…

    def testGradCheckEmbeddingBasic(self):
        seqs = ['ghatmasala', 'nicela', 'c-pakodas']
        e = nn.Embedding(10, 3, sparse=False).double()
        indices = Variable(torch.LongTensor([[1], [4]]))
        embed = e(indices)
        print(embed)
        embed.retain_grad()
        input = (embed, )
        model = nn.Linear(3, 3).double()
        test = gradcheck(model, input, eps=1e-6, atol=1e-4)

Variable containing:
(0 ,.,.) =
-1.5149 0.3036 -0.8191

(1 ,.,.) =
-0.6803 0.7728 2.4776
[torch.DoubleTensor of size 2x1x3]

Ran 1 test in 0.246s

OK

albanD · April 13, 2018, 5:52pm

It does require grad, but it is not a leaf Variable (a Variable created by the user). The fact that it requires grad means that its gradient is needed to compute the gradient of other leaf Variables that require grad. So it will be computed but not saved to not increase memory usage for no reason.
If you want its gradient to be saved, you need to call .retain_grad() that will make it save its gradients even though it is not a leaf Variable.

DuaneNielsen · April 13, 2018, 5:52pm

Cool, thanks for the explanation Alban!