Hello together,
I’m trying to use a model with one parameter being used in two layers but slightly altered in the second layer. It looks something like this:
class CrNN(nn.Module):
def __init__(self, n, m):
self.P1 = nn.Parameter(torch.rand((m, n))) #weights
self.P2 = nn.Parameter(torch.rand(m)) # bias
self.Layer1=nn.Linear(n, m)
self.Layer1.bias=self.P2 # This is the correct parameter for the bias
self.Layer2=nn.Linear(m, n, bias=False)
self.Layer2.weight=self.P1 # This is the correct parameter for this layer
Now I would like my net to be described by only the two parameters and giving the weights of the first layer in terms of the P1 parameter as follows:
torch.where(self.P1<0,-self.P1,0)
so I am only interested in the negated negative entries in the tensor for the first layer weights.
How can this be done. If I simply set this calculation in the initializer, I get a new parameter independent from P1.
I would suggest keeping your Parameters, P1 and P2, as is, but using
the functional form of linear(), rather than the class Linear.
Because torch.where() doesn’t backpropagate cleanly (assuming that
you do want to backpropagate through the altered layer1 weights), you
should use something like minimum() in place of where(). (The example
I give below backpropagates contributions to the gradient of P1 from its
use both in layer1 (in altered form) and in layer2.)
Note, there is no need to wrap P1 and P2 in Linears – as properties of
your CrNNModule, they are first-class Parameters that can be passed
to optimizers, etc.
These points are illustrated in the following script:
import torch
print (torch.__version__)
torch.manual_seed (2022)
class CrNN (torch.nn.Module):
def __init__ (self, n, m):
super().__init__()
self.P1 = torch.nn.Parameter (torch.rand (m, n)) # weights
self.P2 = torch.nn.Parameter (torch.rand (m)) # bias
def forward (self, input): # assumes input has shape [*, n], e.g., [nBatch, n]
layer1_weight = -torch.minimum (self.P1, -torch.zeros (1))
x = torch.nn.functional.linear (input, layer1_weight, self.P2) # apply layer1, with bias
x = torch.nn.functional.linear (x, self.P1.T) # apply layer2, no bias
return x
nBatch = 3
n = 5
m = 10
mod = CrNN (n, m)
print (list (mod.parameters()))
input = torch.randn (nBatch, n)
output = mod (input)
print (input)
print (output)
loss = (output**2).sum()
loss.backward()
print (mod.P1.grad)