Gradient seems to small when training parameters


I am trying to train a matrix nxn in what looks like a simple scenario. However, I am struggling to obtain a well train matrix when n is big (let’s say above 30). I have the following equation : y = sigmoid(Wx) where W is a nxn matrix, x is the input vector (nx1) and y is the output vector (also nx1). The sigmoid function applies to every elements of Wx. When, I use a small n, I have no problem to converge back to the original matrix. When I use a bigger n, it’s seems difficult for PyTorch to converge. Here’s my simple class and my code:

class MyModel(nn.Module):

    def __init__(self, W):
        super(MyModel, self).__init__()
        self.W = W
    def forward(self, x):
        return torch.sigmoid(torch.matmul(self.W, x))
#Initialise true value

x = np.random.rand(1000, 100)  #Input

W_true = np.random.randn(100, 100)  #True matrix

y = np.zeros((x.shape[0], x.shape[1]))

sigma = lambda x: ((1+np.exp(-x))**(-1))  #sigmoid

for i in range(1000):

    y[i] = sigma(W_true @ x[i])   #True output

# Set up

x = torch.tensor(x, dtype=torch.float32)

W_true = torch.tensor(W_true, dtype=torch.float32)

W = torch.randn(100, 100, dtype=torch.float32, requires_grad=True)

W = nn.Parameter(W)

y = torch.tensor(y, dtype=torch.float32)

learning_rate = 100

model = MyModel(W=W)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
loss = nn.MSELoss()

n_iters = 300

for epoch in range(n_iters):
    #Forward pass
    y_pred = torch.zeros(x.shape[0], x.shape[1])
    for i in range(1000):

        y_pred[i] = model(x[i])

    l = loss(y, y_pred)




I tried to play with learning rate (which needs to be pretty high which seems weird) and momentum but it does not seems to improve my situation ? Any ideas ? :smile:

I am still new to the world of PyTorch (and deep learning) and I would really appreciate if someone could help me in training a matrix in an equation.