How to modify weights of a layer before the layer is applied to input?

ifsheldon · June 27, 2020, 9:59am

Hi! I’m new to PyTorch and I am trying to modify weights of a layer before the layer is applied to input, but I don’t know how to get gradients right.

I have searched the forum and found some related discussions, but they are a bit different, in which weights are not modified at run time and in training.

Here’s an example. I want to apply a function (say Sigmoid) to the weight of a Conv2d before it is applied to an image, which means I want to use the sigmoid value of the weights to do the convolution instead of the weights themselves.

Here is my code, taking 28*28 vectors of MNIST dataset as input.

My intention is to save the original weights in self.conv_weight, and when doing forwarding, replace the weights of conv layers with f(wieghts) which is here sigmoid(self.conv_weight) while still preserving origal weights for BP. And I was expecting autograd will update self.conv_weight after opt.step().

But the problem is that it seems no grads are attatched to self.conv_weight when doing forwarding(self.conv_weight.grad is None after calculating the loss).

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)
        self.conv1_weight = self.conv1.weight.data
        self.conv1_weight.requires_grad_()
        self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)
        self.conv2_weight = self.conv2.weight.data
        self.conv2_weight.requires_grad_()
        self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)
        self.conv3_weight = self.conv3.weight.data
        self.conv3_weight.requires_grad_()

    def forward(self, xb):
        relu = F.relu
        sigmoid = torch.sigmoid
        xb = xb.view(-1, 1, 28, 28)
        self.conv1.weight.data = sigmoid(self.conv1_weight)
        xb = relu(self.conv1(xb))
        self.conv2.weight.data = sigmoid(self.conv2_weight)
        xb = relu(self.conv2(xb))
        self.conv3.weight.data = sigmoid(self.conv3_weight)
        xb = relu(self.conv3(xb))
        xb = F.avg_pool2d(xb, 4)
        return xb.view(-1, xb.size(1))

My code related to fitting and loss is below

def get_model():
    model = Net()
    return model, optim.SGD(model.parameters(), lr=LR, momentum=0.9)

def loss_batch(model, loss_func, xb, yb, opt=None):
    loss = loss_func(model(xb), yb)
    print(model.conv1_weight.grad)
    if opt is not None:
        loss.backward()
        opt.step()
        opt.zero_grad()

    return loss.item(), len(xb)


def fit(epochs, model, loss_func, opt, train_dl, valid_dl):
    for epoch in range(epochs):
        model.train()
        orig = model.conv1.weight.data.clone()
        for xb, yb in train_dl:
            loss_batch(model, loss_func, xb, yb, opt)

        model.eval()
        with torch.no_grad():
            losses, nums = zip(
                *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl])
        val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)

        print(epoch, val_loss)

train_dl , valid_dl = get_dataloaders(train_ds, valid_ds, BATCH_SIZE)
model, opt = get_model()
fit(10, model, F.cross_entropy,opt, train_dl, valid_dl)

My code is a “workaround”, though it seems failed. Could you please help me with that and possibly explain why it failed and how nn.Module actually works? THANKS A LOT!

ptrblck · June 28, 2020, 9:55am

Don’t use the .data attribute, as it might break your code in various ways.

For your use case, you could use the functional API as given in this example:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)
        self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)
        
    def forward(self, x):
        x = F.conv2d(x, torch.sigmoid(self.conv1.weight), self.conv1.bias)
        x = F.relu(x)
        x = F.conv2d(x, torch.sigmoid(self.conv2.weight), self.conv2.bias)
        x = F.relu(x)
        x = F.conv2d(x, torch.sigmoid(self.conv3.weight), self.conv3.bias)
        x = F.relu(x)
        x = F.avg_pool2d(x, 4)
        return x


model = Net()
data = torch.randn(1, 1, 28, 28)
out = model(data)
out.mean().backward()

for name, param in model.named_parameters():
    print(name, param.grad)

malioboro · December 27, 2022, 6:46pm

Hi, I get a similar case with this question. I currently solve it using functional API, but I am wondering why can’t we use normal API (I mean, like, nn.Linear/nn.Conv2D) and change its weight directly?

in my case, I need to add some functions and parameters to the weight of a pre-trained model, so it will be more easy if I can modify the weights without knowing the detail of the computation of the pre-trained model

ptrblck · December 27, 2022, 9:01pm

You can change the parameters, but would need to:

Make sure you are manipulating them inplace unless you explicitly want to replace the parameters with a new object. In this case, the optimizer will lose the reference to the old parameter and the new one won’t be updated unless you add it as a new param group.
Make sure to use the maipulation in a no_grad() context manager to skip Autograd tracking for this operation.
Make sure not to use the deprecated .data attribute.

malioboro · January 2, 2023, 7:28pm

Hi, I have tried your suggestion, and it works, thank you. But I got a problem: Training iteration keeps getting slower. It started like ~10 it/s and became ~1 it/s after 5 epochs. I have checked there is no increase in GPU memory usage. I am not sure why.

So I want to change the weight of a pre-trained model to W = A + B, where A is not trainable parameter and B is a trainable parameter. This is my current implementation.

def forward(self, x)::
    for i, params in enumerate(self.model.parameters()):
        params.copy_(self.A[i].detach())
        params.add_(self.B[i])    
    x = self.model(x)
    return x

I checked the calculation on training (the W), and it’s already correct, but as I said, it’s getting slower on each epoch. I hope you can help me to debug this, please. Thank you.