I do gradient descent manually, but something wrong

Hi, I’m a noob in deep learning as well as in pytorch. The thing is I want to make a fully connnected network without using higher level api, like nn.Module. I’ve done that with numpy, but begin to dive deep into nn.module, I’d like to do that again in pytorch only with autograd and tensor. What I did is building a network with 3 hidden layer and 1 output layer. But something wrong when I tried to take gradient descent, because the accuracy is about 10% :confused: . I don’t know what that is.
I’m using mnist as dataset, and each column of x_train is a vector reshaped to (784, 1), so x_train should be (784, 500).
Here’s the training code:

for epoch,(x_batch,y_batch) in enumerate(train_batch):
    print(f"In training: epoch {epoch+1}, total {int(train_label.shape[0]/500)}") # 500 samples for a batch
    # Forward propagation
    print("\t"+"Forward propagation")
    f1 = self.fully_connected_layer_1_weight@x_batch + self.fully_connected_layer_1_bias
    activated1 = torch.softmax(f1/f1.sum(dim=0),dim=0)
    f2 = self.fully_connected_layer_2_weight@f1 + self.fully_connected_layer_2_bias
    activated2 = torch.softmax(f2/f2.sum(dim=0),dim=0)
    f3 = self.fully_connected_layer_3_weight@f2 + self.fully_connected_layer_3_bias
    activated3 = torch.softmax(f3/f3.sum(dim=0),dim=0)
    f4 = self.forcast_weight@f3
    activated4 = torch.softmax(f4/f4.sum(dim=0),dim=0)

    # Calculate loss
    print("\t"+"Calculate loss")
    ans = activated4[torch.argmax(input=activated4, dim=0), range(500)]
    loss = ((ans - y_batch)**2).mean()

    # Backward propagation
    print("\t"+"Backard propagation")
    loss.backward(retain_graph=True)

    # Gradient descent
    # I tried to use -= operator on weights but find -= is recorded and will be propagated
    # So I take a dirty way like this to create a new descented tensor without any computation graph.
    # And I forced self.fully_connected_layer_xs to point to descended tensors
    # But these weights don't change! 
    print("\t"+"Gradient descent")
    self.fully_connected_layer_1_weight=(\
        self.fully_connected_layer_1_weight- \
        self.fully_connected_layer_1_weight.grad*learning_rate).data.clone().detach().requires_grad_(True)
    self.fully_connected_layer_2_weight=(\
        self.fully_connected_layer_2_weight- \
        self.fully_connected_layer_2_weight.grad*learning_rate).data.clone().detach().requires_grad_(True)
    self.fully_connected_layer_3_weight=(\
        self.fully_connected_layer_3_weight- \
        self.fully_connected_layer_3_weight.grad*learning_rate).data.clone().detach().requires_grad_(True)
    self.forcast_weight=(\
        self.forcast_weight- \
        self.forcast_weight.grad*learning_rate).data.clone().detach().requires_grad_(True)

Hi,

Your gradient update is not done properly I’m afraid:

  • You should not need retain_graph=True. This is only needed here because the gradient update is wrong, you can remove it.
  • To update a weight without the autograd tracking the update, you should use torch.no_grad():
with torch.no_grad():
  self.fully_connected_layer_1_weight -= self.fully_connected_layer_1_weight.grad*learning_rate

Hi Jack!

And also be sure to zero out your gradients after taking the optimization
step. (Otherwise each loss.backward() call will keep accumulating
into the gradient.) E.g., something like this:

self.fully_connected_layer_1_weight.grad.zero_()

Best.

K. Frank

1 Like

Thank so much! It’s exactly what I need. It works!

Thanks, I’ll do that.