Hi, I’m a noob in deep learning as well as in pytorch. The thing is I want to make a fully connnected network without using higher level api, like nn.Module. I’ve done that with numpy, but begin to dive deep into nn.module, I’d like to do that again in pytorch only with autograd and tensor. What I did is building a network with 3 hidden layer and 1 output layer. But something wrong when I tried to take gradient descent, because the accuracy is about 10% . I don’t know what that is.
I’m using mnist as dataset, and each column of x_train
is a vector reshaped to (784, 1), so x_train
should be (784, 500).
Here’s the training code:
for epoch,(x_batch,y_batch) in enumerate(train_batch):
print(f"In training: epoch {epoch+1}, total {int(train_label.shape[0]/500)}") # 500 samples for a batch
# Forward propagation
print("\t"+"Forward propagation")
f1 = self.fully_connected_layer_1_weight@x_batch + self.fully_connected_layer_1_bias
activated1 = torch.softmax(f1/f1.sum(dim=0),dim=0)
f2 = self.fully_connected_layer_2_weight@f1 + self.fully_connected_layer_2_bias
activated2 = torch.softmax(f2/f2.sum(dim=0),dim=0)
f3 = self.fully_connected_layer_3_weight@f2 + self.fully_connected_layer_3_bias
activated3 = torch.softmax(f3/f3.sum(dim=0),dim=0)
f4 = self.forcast_weight@f3
activated4 = torch.softmax(f4/f4.sum(dim=0),dim=0)
# Calculate loss
print("\t"+"Calculate loss")
ans = activated4[torch.argmax(input=activated4, dim=0), range(500)]
loss = ((ans - y_batch)**2).mean()
# Backward propagation
print("\t"+"Backard propagation")
loss.backward(retain_graph=True)
# Gradient descent
# I tried to use -= operator on weights but find -= is recorded and will be propagated
# So I take a dirty way like this to create a new descented tensor without any computation graph.
# And I forced self.fully_connected_layer_xs to point to descended tensors
# But these weights don't change!
print("\t"+"Gradient descent")
self.fully_connected_layer_1_weight=(\
self.fully_connected_layer_1_weight- \
self.fully_connected_layer_1_weight.grad*learning_rate).data.clone().detach().requires_grad_(True)
self.fully_connected_layer_2_weight=(\
self.fully_connected_layer_2_weight- \
self.fully_connected_layer_2_weight.grad*learning_rate).data.clone().detach().requires_grad_(True)
self.fully_connected_layer_3_weight=(\
self.fully_connected_layer_3_weight- \
self.fully_connected_layer_3_weight.grad*learning_rate).data.clone().detach().requires_grad_(True)
self.forcast_weight=(\
self.forcast_weight- \
self.forcast_weight.grad*learning_rate).data.clone().detach().requires_grad_(True)