Which parameters torch.nn forward() method takes in? Is PyTorch forward() equivalent to standard forward pass?

Hello!

I’ve implemented my own code in Java (and in C++ as well) that constructs and trains the neural networks on MNIST hand-written number dataset, using simple stochastic gradient descent while achieving 89-90% guessing accuracy. My problem is when I try to load PyTorch network parameters that I got from training on the same MNIST dataset and pass them to my own forward() function, PyTorch model always predicts the same number (different model, different number), i.e. 10% accuracy which you get when always predicting the same number.

Am I missing something? Does PyTorch use just network parameters to do regular forward prop: compute a matrix product of activations and weights, add biases, and then apply activation function when needed? Or does it perform some other operations or use other parameters in addition to tensor behind the scenes that I’m not aware of? Provided that the network structure, the training dataset, and the forward() methods are the same, I was expecting at least comparable results when loading PyTorch parameters to my own implementation of supervised learning methods.

The forward() method that I’ve implemented in my own training method is equivalent to the forward() method I used in PyTorch:

def forward(self, x):
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        return self.fc2(x)

I save the PyTorch trained network parameters with this code:

torch.set_printoptions(precision=20,threshold=sys.maxsize)
with open(file, 'w') as f:
    # Print model's state_dict
    f.write("Model's state_dict:")
    for param_tensor in fcNet.state_dict():
        output_string = str(fcNet.state_dict()[param_tensor])
        f.write(output_string)

The PyTorch accuracy that I get from my PyTorch training code is about 91%.

I think you have some problems:

  • First you are saving a tensor as a string
  • Destroy the structure/shape of the tensor
  • Makes it impossible to properly reload the weights back into a model

I always suggest using torch.save, here is a tutorial: https://pytorch.org/tutorials/beginner/saving_loading_models.html

Hi Eduardo,

and thank you for your answer. Perhaps I haven’t made it clear, but I am not loading the model parameters back into PyTorch code or for any code that uses PyTorch, but rather I read these parameters to load them into my own ML code that I’ve written.

Cheers, R

Now i see the issue, maybe @ptrblck can help you out