Hello!
I’ve implemented my own code in Java (and in C++ as well) that constructs and trains the neural networks on MNIST hand-written number dataset, using simple stochastic gradient descent while achieving 89-90% guessing accuracy. My problem is when I try to load PyTorch network parameters that I got from training on the same MNIST dataset and pass them to my own forward()
function, PyTorch model always predicts the same number (different model, different number), i.e. 10% accuracy which you get when always predicting the same number.
Am I missing something? Does PyTorch use just network parameters to do regular forward prop: compute a matrix product of activations and weights, add biases, and then apply activation function when needed? Or does it perform some other operations or use other parameters in addition to tensor behind the scenes that I’m not aware of? Provided that the network structure, the training dataset, and the forward()
methods are the same, I was expecting at least comparable results when loading PyTorch parameters to my own implementation of supervised learning methods.
The forward()
method that I’ve implemented in my own training method is equivalent to the forward()
method I used in PyTorch:
def forward(self, x):
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
return self.fc2(x)
I save the PyTorch trained network parameters with this code:
torch.set_printoptions(precision=20,threshold=sys.maxsize)
with open(file, 'w') as f:
# Print model's state_dict
f.write("Model's state_dict:")
for param_tensor in fcNet.state_dict():
output_string = str(fcNet.state_dict()[param_tensor])
f.write(output_string)
The PyTorch accuracy that I get from my PyTorch training code is about 91%.