Applying torch.nn.Sigmoid() to weights in forward - Should modify backward?

XYZCODE123 · March 2, 2021, 9:59pm

Hi,
I’m trying to train my small model on MNIST. At the same time, I’m also learning/playing with PyTorch.
I created custom Layer:

class MyLinearLayer(torch.nn.Module):
    """ Custom Linear layer but mimics a standard linear layer """
    def __init__(self, size_in, size_out):
        super().__init__()
        self.size_in, self.size_out = size_in, size_out
        A = torch.Tensor(size_in, size_out)
        self.A = torch.nn.Parameter(A)  # nn.Parameter is a Tensor that's a module parameter.

        # initialize weights and biases
        y = 1.0/np.sqrt(size_in)
        torch.nn.init.uniform_(self.A, -y, y) # weight init

    def forward(self, x):
        x = torch.matmul(x, self.A)
        return x

And here is my model

class Mnist_Model(torch.nn.Module):
  def __init__(self):
    super(Mnist_Model, self).__init__()
    self.cn1 = torch.nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3)
    self.maxpooling = torch.nn.MaxPool2d(2, 2)
    self.cn2 = torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3)
    self.reduction_layer = MyLinearLayer(10, 10)
    self.fc1 = torch.nn.Linear(in_features=800 ,out_features=100)
    self.fc2 = torch.nn.Linear(in_features=100, out_features=10)
  


  def num_flat_features(self, x):
    size = x.size()[1:]  # all dimensions except the batch dimension
    num_features = 1
    for s in size:
      num_features *= s
    return num_features
  
  def forward(self, x):
    x = torch.nn.ReLU()(self.cn1(x))
    x = self.maxpooling(x)
    x = torch.nn.ReLU()(self.cn2(x))
    x = self.maxpooling(x)
    x = x.view(-1, self.num_flat_features(x))
    x = torch.nn.ReLU()(self.fc1(x))
    x = torch.nn.ReLU()(self.fc2(x))
    x = self.reduction_layer(x)
    x = torch.nn.Sigmoid()(x)
    return x

network_MNIST = Mnist_Model()

Such approach calls pretty good results. After 1 epoch, I have acc around 97%.
But small modification in my custom layer - in forward applying sigmoid function to my weights:

class MyLinearLayer(torch.nn.Module):
    """ Custom Linear layer but mimics a standard linear layer """
    def __init__(self, size_in, size_out):
        super().__init__()
        self.size_in, self.size_out = size_in, size_out
        A = torch.Tensor(size_in, size_out)
        self.A = torch.nn.Parameter(A)  # nn.Parameter is a Tensor that's a module parameter.

        # initialize weights and biases
        y = 1.0/np.sqrt(size_in)
        torch.nn.init.uniform_(self.A, -y, y) # weight init

    def forward(self, x):
        x = torch.matmul(x, torch.nn.Sigmoid()(self.A))
        return x

stops my model from learning at all - accuracy 0%
In order to make this work should I define backward() in my custom layer that will somehow include this small change - applying sigmoid to weights in forward?

ptrblck · March 3, 2021, 6:41am

An accuracy of 0% sounds concerning, as your model would be worse than just predicting classes randomly.
Anyway, your approach looks valid and self.A would get valid gradients. You could check these gradients for their values and see, if the sigmoid is maybe saturating and thus prevent a proper training.