Recently, I implemented a deep neural network (with only one hidden layer in current version), and trained it on the MNIST dataset. It is so weird that the classification error rate on training set is near 99%, which is even worse than random guess.
Compared with traditional nn model, the new model is a little bit special:
It has specific local loss for each layer. This means it cannot be trained using the
loss.backwardfunction. Instead, for each layer, I first use the
torch.nn.gradto compute all related gradients of that layer, and then manually update parameters by SGD (thanks @rasbt for giving me such an implementation idea).
There is no closed-form expression for the hidden layer representation. So this means we have to learn the representation by running an iterative minimization algorithm, rather than computing several activations layer-by-layer.
I have put my code on GitHub, you can find it here. I have no idea why the classification error rate is so large. I think maybe one of the possible reasons is that some parameters were not updated properly.
I have struggled with this problem for about 10 days. Any suggestions and comments would be highly appreciated. Thanks in advance.