Pooling output of fully connected layers?

Hi there!

I’ve just read the ResNet paper and I decided to implement a version that works on tabular data.

The only hold-up is figuring out how to map the input to a residual block down to the same size as the block’s output. I figured a nice way to do this might be to use some form of pooling. I thought maybe MaxPool1d might do the trick, but I looked at the docs and saw that this is still a pooling method designed for a 3d tensor.

Before I try to go create some silly version of my own, is there anything built into PyTorch that does this for you?

If you want to use the pooling kernel on the feature dimension of the output of the linear layer, you could unsqueeze a channel dimension, apply the pooling, and squeeze the channel dimension again.
Note that the kernel would pool neighboring features. The output activation of linear layers would not contain spatially correlated features, so I would recommend to also test other strategies such as keeping the largest outputs or calculate some form or correlation factor.

Let us know how your experiments went! :slight_smile:

Thanks, I actually ended up doing almost exactly that except I used adaptive pooling. Super cool feature!

I’ve posted some very mixed results here: How to use residual learning applied to fully connected networks?

It’s not pretty, but maybe a little tweaking will fix it?