Thanks, I actually ended up doing almost exactly that except I used adaptive pooling. Super cool feature!
I’ve posted some very mixed results here: How to use residual learning applied to fully connected networks?
It’s not pretty, but maybe a little tweaking will fix it?