How can I keep gradient when I change one row

JuyiLin · April 6, 2023, 4:48pm

I have a tensor , input size = (3,4)
I have to change the second row with new size = (1,4)
How can I change it while keeps the gradient?
When I used these codes, it shows

        x.masked_fill_(mask, 0)  # set the values of cached nodes in x to 0
        x += emb  # add the embeddings of the cached nodes to x
        return x

RuntimeError: one of the variables needed for gradient computation has been modified by an in-place operation: [torch.cuda.FloatTensor [3, 4]], which is output 0 of ReluBackward0, is at version 2
I change to

+        out = x.clone()
+        out.masked_fill_(mask, 0)  # set the values of cached nodes in x to 0
+        out += emb  # add the embeddings of the cached nodes to x
+        return out

But it cannot keep the gradient.

My question is similar with

But it is several years ago so I want to know whether there is new solution.

ptrblck · April 7, 2023, 2:37am

Cloning the tensor might be the right approach and I don’t understand the last claim:

This small code snippet works fine for me:

x = torch.randn(3, 4, requires_grad=True)
mask = torch.randint(0, 2, (3, 4)).bool()
emb = torch.randn(1, 4, requires_grad=True)

out = x.clone()
out.masked_fill_(mask, 0)  # set the values of cached nodes in x to 0
out += emb  # add the embeddings of the cached nodes to x

out.mean().backward()

print(x.grad)
# tensor([[0.0833, 0.0833, 0.0833, 0.0000],
#         [0.0833, 0.0000, 0.0833, 0.0000],
#         [0.0000, 0.0833, 0.0833, 0.0000]])

print(emb.grad)
# tensor([[0.2500, 0.2500, 0.2500, 0.2500]])

Could you describe which gradient exactly won’t be kept?