1D convolution in channel-axis of 2D image

I have 2D image with lots (houndreds) of channals.
Nearby channels are very correlated.

For now i’m using entry group with several Conv2D layers with kernel size = ( 1, 1 ).
It’s working ok.

But i assume, that doing 1d-convolution in channel axis, before spatial 2d convolutions
allows me to create smaller and more accurate model.

I’ve created this straightforward wrapper,
for converting (N, C, H, W) layout to (N, 1, C) layout (which, is capable for Conv1d),
and backwards:

class InChannelConv(nn.Module):

	def __init__(self, body):
		super().__init__()
		self.body = body

	def forward(self, x):

		n2, c2, h2, w2 = x.size()

		x = x.permute(0, 2, 3, 1).view(n2*h2*w2, 1, c2).contiguous()
		x = self.body(x)
		x = x.view(n2, h2, w2, -1).permute(0, 3, 1, 2).contiguous()

		return x

And using my wrapper like this:

ChannelConv(nn.Sequential(
	nn.Conv1d(1, 32, 7), nn.ReLU(),
	nn.Conv1d(32, 64, 3), nn.ReLU(),
	nn.Conv1d(32, 1, 1), nn.ReLU(),
	))
...

That gives me OOM, even with very small body.

When i use useless-small body like Conv1d(1, 1, 1), i’ve got:

RuntimeError: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

I know about nn.PixelShuffle, but it not exactly what i want.
It mixes spatial and “in-channel” convolution,
moreover i can’t make any assumptions about channels count.

So there is a question:
how to make efficient 1D-convolution in channels axis (independent for every pixel),
having lots-of-channels 2D image on input and output?

As far as I know, the call to contiguous copies the data and returns a new one, so this might be the explanation for your OOM error. Could you tell how large x is in your case?

Also, I think you might have to call contiguous after permute, since this operation is returning a non-contiguous tensor. Could you try that and run your code again?

Му current input size is [1, 138, 256, 256] in float32,
which is, i belive, about 40Mb.

The rest part of network is (for testing purposes) is just one small Conv2D.
So i have about 18k of params, in total.

My current best model working fine on same input, having about 2.5m params (Titan Black, 6Gb RAM).

So, as far as i understand, event tens of copies of x is not even enough to fill the RAM.

I’ve tried this implementation:

	def forward( self, x ):

		n2, c2, h2, w2 = x.size()

		x = x.permute( 0, 2, 3, 1 )
		x = x.contiguous()
		x = x.view( n2 * h2 * w2, 1, c2 )
		x = x.contiguous()

		x = self.body( x )

		x = x.view( n2, h2, w2, -1 )
		x = x.contiguous()
		x = x.permute( 0, 3, 1, 2 )
		x = x.contiguous()

		return x

And even this:

		n2, c2, h2, w2 = x.size()

		x = x.cpu()

		x = x.permute( 0, 2, 3, 1 )
		x = x.contiguous()
		x = x.view( n2 * h2 * w2, 1, c2 )
		x = x.contiguous()

		x = x.cuda()
		x = self.body( x )
		x = x.cpu()

		x = x.view( n2, h2, w2, -1 )
		x = x.contiguous()
		x = x.permute( 0, 3, 1, 2 )
		x = x.contiguous()

		x = x.cuda()

		return x

The result is stable.
With big body - OOM.
With small body - CUDNN_STATUS_NOT_SUPPORTED.

My last thoughts about CUDNN_STATUS_NOT_SUPPORTED is that Conv1D is not supports such large batch size.
Found this issue: https://github.com/pytorch/pytorch/issues/4107

Is it makes any sesnse to try manualy split whole array in small batches?