Custom weight init: do I need detach( )?

Greetings!

Suppose I want to rescale the initialized weight matrix in a certain way like:

class Net(nn.Module):
    def __init__(self, R):
        super(Net, self).__init__()
        self.lin1 = nn.Linear(2, 100)
        self.lin2 = nn.Linear(100, 3)
        self.lin2.weight.mul_( (R / torch.sqrt(torch.sum(self.lin2.weight.detach()**2 ,axis=1)))[:, None] )

So I want to change the weight of lin2 based on the current elements of the weights. Do I need the detach ()?
Also, code gives the error “a leaf Variable that requires grad is being used in an in-place operation”, so the mul_ does not seem to work. Can I just reassign the right-hand expression to my weight matrix, i.e. self.lin2.weight = ...?

Hi,

I’d write it in a (maybe verbose) way like

class Net(nn.Module):
    def __init__(self, R):
        super().__init__()
        self.lin1 = nn.Linear(2, 100)
        self.lin2 = nn.Linear(100, 3)

        with torch.no_grad():
            weight = self.lin2.weight.data.clone().detach()
            weight.mul_( (R / torch.sqrt(torch.sum(weight**2 ,axis=1)))[:, None] )
            self.lin2.weight.data.copy_(weight)
            del weight

    def forward(self, x):
        return self.lin2(F.relu(self.lin1(x)))

A toy colab is here: https://colab.research.google.com/drive/12yqnYoRF1C_of6s8fokTEqK2h9wl-wvj?usp=sharing

1 Like

Thank you, that would do!

Do I even need .data and detach()? Usually, .detach() is enough, is it now?

I’m not sure but I did so because I felt I just wanted to use torch.Tensor and avoid sharing the storage.

1 Like

Yes, don’t use the deprecated .data attribute as it’s dangerous and can yield unwanted side effects! :wink:

2 Likes

Thanks ptrblck.

How would you rewrite crcrpar’s solution, or do you have a better suggestion?

You should be able to remove the .data usage from your code without any additional changes.