CNN Calculated Padded Error


#1

I’m trying to replicate a Keras model which starts out like this:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 720, 1, 1)         0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 720, 1, 128)       1152      
_________________________________________________________________

The input is (batch_size, 720, 1, 1) and then a conv2d layer is applied on it using 128 filters and kernel size of 8. In trying to replicate this in pytorch, I have:

import torch

a = torch.randn(32,720, 1, 1)
print('a:', a.size()) # a: torch.Size([32, 720, 1, 1])

torch.nn.Conv2d(720, 128, kernel_size=8, stride=1)(a)

But I’m getting the following error…

RuntimeError: Calculated padded input size per channel: (1 x 1). Kernel size: (8 x 8). Kernel size can’t greater than actual input size at /pytorch/aten/src/THNN/generic/SpatialConvolutionMM.c:48

Any ideas what I’m doing wrong and why this is working on keras and not on pytorch?


#2

Could you post the keras code?
It doesn’t really make sense to use a kernel size of 8 on an input of 1x1 spatial dimension.


#3

Sure. Here’s the first part of the Keras model.

x = keras.layers.Input(x_train.shape[1:])
conv1 = keras.layers.Conv2D(128, 8, 1, border_mode='same')(x)
conv1 = keras.layers.normalization.BatchNormalization()(conv1)
conv1 = keras.layers.Activation('relu')(conv1)

#4

@ptrblck: The full Keras code & preprocessing can be viewed here (github).


(colesbury) #5

Your Keras model probably uses NHWC ordering for the convolutions. PyTorch uses NCHW for convolutions.
This means your input should be transposed. Also, it looks like you’re doing a 1-d convolution so you probably want to change the kernel size to: (8, 1).

a = torch.randn(32, 1, 720, 1)
torch.nn.Conv2d(1, 128, kernel_size=(8,1), stride=1)(a)

Here N=32, C=1, H=720, W=1. In the output C will be 128. N is batch. C is channels. H and W are the spatial dimensions of the inputs.


#6

Makes sense - thanks!