Apply function "convolutionally" to input

I have implemented a function which takes a tensor of size (batch_size x width x height) as input, and returns a tensor of size (batch_size x 1 x 1).

What I would like to do is apply this “convolutionally” to a tensor - in other words, sliding a window across the input tensor, and applying the function to each of these windows in turn, producing an output which (with padding) is the same size as the input tensor.

What would be the best/simplest way to do this?

Is there any reason that prevents you from using nn.Conv2d? I don’t know the exact purpose of your function but at the first glance, I think nn.Conv2d can do the same thing for you.

@KaiyangZhou Unfortunately, nn.Conv2d doesn’t work in this case. In essence, nn.Conv2d runs each window of the input through a fully connected linear layer, but my function works quite differently to this.

Perhaps you could explain in more detail about what you want to achieve, so other people who have done this can help you.

Perhaps a better way of explaining what I’m trying to do is that I’m trying to find a better way of implementing the following pseudocode:

def convolutional_function(input):
    for x in range(width):
        for y in range(height):
            index window of input
            intermediate_output = myfunction(indexed_input)
            concatenate intermediate inputs

I don’t really understand, was “intermediate inputs” is.
Maybe Tensor comprehensions could help you out.

I agree, tensor comprehensions would definitely work well, except for the fact that, at present, they only support fixed-size inputs

OK, buy could you post your method in the “Einstein notation” ? Maybe we could still use vanilla convolutions.

Unfortunately, I do not currently know “Einstein notation”

No worries, let’s go through your example.

index window of input

Are you indexing it using x and y or something like x:x+width etc.
Intermediate output is just a scalar?
What happens in the last line?

Thanks very much for your help. In answer to your questions:

  1. It’s “something like x:x+width etc.” - It’s indexing a portion of the input. In fact, I think it would be more akin to x-0.5*kernel_size:x+0.5*kernel_size
  2. Yes (or a tensor of size(number of batches x 1 x 1))
  3. The last line isn’t really correct. My point was just that the scalar outputs need to be joined together into a 2d tensor

BTW: the pseudocode does not account for padding, but this would be necessary to ensure input and output sizes of the function are equal

Ok, so as far as I understand, the operation is equal to a convolution regarding the size, stride etc., but instead of the dot product, you would like to perform some other operation.

Tensor comprehensions might work, but currently doesn’t support padding.

What do you think of using im2col as a potentially ugly hack?
Would that work for your function?
Have a look at Pete Warden’s blog post.