Parameter is supposed to be trainable but does not change

Hello all,

im kind of stuck with this problem and it starts to get very frustrating. I would like to train a model that basically consists of two filter operations with pre set trainable weights and a threshold, again pre set and trainable.

Here’s the train code:

from model_thresh import Net
import numpy as np
from input_data import LoadData
from process_data import ProcessData
from batch_generator import GenerateImageBatch
import torch
import torch.nn as nn

torch.autograd.set_detect_anomaly(True)

def main():
    # Load data
    img_folder = r"..."
    lab_folder = r"..."
    data = LoadData()
    images = data.read_images_from_folder(img_folder)
    images = np.concatenate((images), axis=0)
    labels = data.read_labels_from_folder(lab_folder)
    labels = np.concatenate((labels), axis=0)

    # Preprocess data
    n = 0.2  # Ratio for train and validation data
    data = ProcessData(images, labels, n)
    train_images, train_labels, validation_images, validation_labels = data.split_data()

    # Create batch architecture
    batch_size = 10
    epochs = 3
    train_data = GenerateImageBatch(train_images, train_labels, batch_size)
    validation_data = GenerateImageBatch(validation_images, validation_labels, batch_size)

    # Create model object
    net = Net()
    net.cuda()
    net = net.float()

    # Set up the optimizer, the loss, the learning rate scheduler and the loss scaling for AMP
    optimizer = torch.optim.SGD(net.parameters(), lr=500)
    criterion = nn.BCELoss()

    # Begin training
    for epoch in range(1, epochs + 1):
        net.train()
        train_loss = 0
        for i in range(int(train_data.img_count / batch_size)):
            # Forward pass: Compute predicted y by passing x to the model
            train_data.next_batch()
            inputs = train_data.inputs
            inputs = np.transpose(inputs, (0, 3, 1, 2))

            inputs = torch.from_numpy(inputs)
            inputs = inputs / 255

            outputs = torch.from_numpy(train_data.outputs)
            outputs = np.transpose(outputs.float(), (0, 3, 1, 2))
            inputs, outputs = inputs.cuda(), outputs.cuda()

            y_pred = net(inputs.float())
            y_pred_sum = y_pred - y_pred.min()
            y_pred = y_pred_sum / y_pred_sum.max()
            y_pred = y_pred.cuda()

            # Compute and print loss
            loss = criterion(y_pred, outputs)

            # Zero gradients, perform a backward pass, and update the weights.
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            # Calculate Loss
            train_loss += loss.item()

if __name__ == "__main__":

    main()


And the model code:

import torch
import torch.nn as nn

torch.autograd.set_detect_anomaly(True)


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # Gaussian filter
        # Gaussian filter
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(3,3)).cuda()
        weights = torch.tensor([[1, 2, 1], [2, 4, 2], [1, 2, 1]], dtype=torch.float32).unsqueeze(0).unsqueeze(0)
        weights.requires_grad = True
        with torch.no_grad():
            self.conv1.weight = nn.Parameter(weights)

        # Sobel filter x direction
        self.conv2 = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(3,3), padding = (2,2)).cuda()
        weights = torch.tensor([[1, 0, -1], [2, 0, -2], [1, 0, -1]], dtype=torch.float32).unsqueeze(0).unsqueeze(0)
        weights.requires_grad = True
        with torch.no_grad():
            self.conv2.weight = nn.Parameter(weights)

        self.highThresholdRatio = torch.nn.Parameter(torch.tensor(0.09), requires_grad=True).cuda()

    def threshold(self, img):

        highThreshold = img.max() * self.highThresholdRatio
        res = torch.where(img > highThreshold, torch.tensor(50.).cuda(), img)
        return res

    def forward(self, x):

        x = self.conv1(x)
        print(self.conv1.weight)
        sobel_x = self.conv2(x)
        result = self.threshold(sobel_x)
        print(self.highThresholdRatio)

        return result

Part of the output:

Parameter containing:
tensor([[[[ -7147.7578, -13180.7393, -11188.9844],
          [ -9285.8301, -14252.4600,  -9565.5410],
          [ -8676.7744, -12288.2373,  -7637.9316]]]], device='cuda:0',
       requires_grad=True)
tensor(0.0900, device='cuda:0', grad_fn=<CopyBackwards>)

Can anyone tell me why the value of the parameter “highThreshold” does not change? I’ve gathered the in place problematic on other threads proposing a “clone()” command within my threshold function but unfortunately that did not solve my issue. The parameter “highThreshold” still was not trainable.

I believe this is because you cannot train a hard threshold parameter. However, you may be able to replace it with soft thresholding instead, see recent suggestion by @KFrank here.

Intuitively, you cannot symbolically differentiate with respect to a hard threshold, because you could say the derivative of the loss w.r.t. that parameter is zero almost everywhere since an infinitesimal change in the threshold won’t change your output almost anywhere.

That did the job, thanks!