Weights disconnection implementation

tengerye · June 7, 2018, 4:11am

I try to implement the disconnection of weights, i.e., the specific connection is always 0. It sounds like masked_scatter_, but I found it could not be autograded.

Here is my code:

import torch
import numpy as np

x = torch.rand((3, 1))
# tensor([[ 0.8525],
#         [ 0.1509],
#         [ 0.9724]])

weights = torch.rand((2, 3), requires_grad=True)
# tensor([[ 0.3240,  0.0792,  0.6858],
#         [ 0.5248,  0.4565,  0.3625]])

mask = torch.Tensor([[0,1,0],[1,0,1]])
# tensor([[ 0.,  1.,  0.],
#         [ 1.,  0.,  1.]])

mask_weights = weights * mask
# tensor([[ 0.0000,  0.0792,  0.0000],
#         [ 0.5248,  0.0000,  0.3625]])

y = torch.mm(mask_weights, x)
# tensor([[ 0.0120],
#         [ 0.7999]])

weights.grad_fn
# None

mask_weights.grad_fn
# MulBackward1 object

The above code implements this graph. Screenshot%20from%202018-06-07%2015-18-04
I want some function which is similar to masked_scatter_ and can apply auto-gradient. Thank you in advance.

reachtarunhere · June 7, 2018, 5:40am

The question isn’t very clear. You to seem to be using y but it hasn’t been defined in the code above?

tengerye · June 7, 2018, 6:05am

Thank you for your kind reply. I have edited my question.

reachtarunhere · June 7, 2018, 7:48am

While I am not clear about the picture you have posted - (why is the shape of x (3,2) if there are just three input nodes). One clear issue with your code though is that none of the variables have a requires_grad=True. For autograd to track stuff at least one of the inputs should have requires_grad=True.

As far as the implementation of the diagram is concerned I would do something like this (assuming each node yields a scalar in the first layer):

# Layer 1
x = torch.randn((3,1), requires_grad=True)
# tensor([ 1.4079,  1.5536,  0.2737])
mask = torch.Tensor([[0,1,0],[1,0,1]])
# tensor([[ 0,  1,  0],
#         [ 1,  0,  1]])
y = mask * x.expand_as(mask)
# tensor([[ 0.0000,  1.5536,  0.0000],
#      [ 1.4079,  0.0000,  0.2737]])
y.grad_fn
# <MulBackward1 at 0x7f4b7a6725c0>

If I am correct about what you are trying to do you don’t need masked_scatter but simple boolean masking.

In any case, masked_scatter_ seems to be working for me and calculating the gradients correctly.

tengerye · June 7, 2018, 8:29am

Thank you so much for your kind help. I edited my post again. Sincere apologize for the confusion .

I am wondering under this implementation, will mask_weights keep tasked?

Thank you so much.

reachtarunhere · June 7, 2018, 8:47am

Can you clarify what do you mean by “keep tasked”?

weights.grad_fn is none because the gradients would be computed wrt weights. At the end when you finally get a scalar and call .backward the right gradients will be calculated in weights.grad. You can use an optimizer or manually update the weights from there.

I assume this is solved now?

tengerye · June 7, 2018, 11:22am

Yes, sure, just one more little question. For the “keep tasked”, I actually refer to “disconnected”. So if I use auto back-propogation, will the disconnection still be disconnected please?

tengerye · June 7, 2018, 11:46am

Sir, I just did an experiment, the gradients of mask_weights are no longer masked.

reachtarunhere · June 7, 2018, 3:29pm

Care to share the code for your experiment

tengerye · June 8, 2018, 2:36am

Yes, sure, thank you.

Continue my codes on the original post.

mask_weights.register_hook(print)

z = torch.Tensor([[1], [1]])
# tensor([[ 1.],
#         [ 1.]])

out = (y-z).mean()
# tensor(-0.6595)

out.backward()
# tensor([[ 0.1920,  0.1757,  0.0046],
#         [ 0.1920,  0.1757,  0.0046]])

weights.grad
# tensor([[ 0.0000,  0.1757,  0.0000],
#         [ 0.1920,  0.0000,  0.0046]])

As you can see, the value of gradients of mask_weights are not masked.