Two-layer neural net in PyTorch could not converge

Guanqun_Yang · November 10, 2018, 2:16am

I am trying to implement 2-layer neural network using different methods (TensorFlow, PyTorch and from scratch) and then compare their performance based on MNIST dataset.

I am not sure what mistakes I have made, but the accuracy in PyTorch is only about 10%, which is basically random guess. I think probably the weights does not get updated at all.

Note that I intentionally use the dataset provided by TensorFlow to keep the data I use through 3 different methods consistent for accurate comparison.

from tensorflow.examples.tutorials.mnist import input_data
import torch

class Net(torch.nn.Module):
    def __init__(self):
      super(Net, self).__init__()
      self.fc1 =  torch.nn.Linear(784, 100)
      self.fc2 =  torch.nn.Linear(100, 10)

    def forward(self, x):
      # x -> (batch_size, 784)
      x = torch.relu(x)
      # x -> (batch_size, 10)
      x = torch.softmax(x, dim=1)
      return x

net = Net()
net.zero_grad()
Loss = torch.nn.CrossEntropyLoss()
optimizer =  torch.optim.SGD(net.parameters(), lr=0.01)

for epoch in range(1000):  # loop over the dataset multiple times

    batch_xs, batch_ys = mnist_m.train.next_batch(100)
    # convert to appropriate settins
    # note the input to the linear layer should be (n_sample, n_features)
    batch_xs = torch.tensor(batch_xs, requires_grad=True)
    # batch_ys -> (batch_size,)
    batch_ys = torch.tensor(batch_ys, dtype=torch.int64)

    # forward
    # output -> (batch_size, 10)
    output = net(batch_xs)
    # result -> (batch_size,)
    result = torch.argmax(output, dim=1)
    loss = Loss(output, batch_ys)

    # backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

bhushans23 · November 10, 2018, 7:56am

You are not using fc1 and fc2 at all.
Your input is passing through relu and softmax and hence no learning

Use fc1 and fc2 in forward function as follows:

 def forward(self, x):
      # x -> (batch_size, 784)
      x = self.fc1(x)
      x = torch.relu(x)
      x = self.fc2(x)
      # x -> (batch_size, 10)
      x = torch.softmax(x, dim=1)
      return x

Also, you need to flatten the tensor before feeding it to Net (I will leave that as an exercise )

ptrblck · November 10, 2018, 10:34am

Additionally to what @bhushans23 said, you shouldn’t use nn.Softmax as the last layer, since nn.CrossEntropyLoss expects logits and applies nn.LogSoftmax internally.
Just remove the nn.Softmax from your model and make sure the other layers are used.