How to use residual learning applied to fully connected networks?

Wesley_Neill · October 9, 2020, 7:05pm

Here is my result of applying batch norm:

I think I’ll try SGD instead of Adam, but after that I’m at a loss as to what else to experiment with. I’m just getting much better and consistent results with a vanilla feed forward network. I only wish I knew if that was on my implementation or on the data.