Custom CNN with softmax applied to each conv operation

I am trying to write a custom CNN layer that applies softmax to each convolution operation. So each pixel in the output image is gonna be valued between [0, 1] and it is the sum of the convolved pixel. An example of TensorFlow implementation can be seen here.

Ideally, this should be trained with binary cross-entropy loss. I tried below but it does not train.

class TransitionModel(nn.Module):
    def __init__(self):
        super(TransitionModel, self).__init__()

        self.weight = nn.Parameter(torch.Tensor(1, 1, 3, 3))
        nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        self.softmax = nn.Softmax(dim=1)

    def forward(self, s, test=False):
        return F.conv2d(s, self.softmax(self.weight), padding=1)

model = TransitionModel()
pred_s_p = model(s)
pred_s_p = pred_s_p.squeeze()
loss = F.binary_cross_entropy(pred_s_p, s_p)

s.shape = [1, 1, 32, 32]
pred_s_p.shape = [32, 32]

It throws the following error:

RuntimeError: reduce failed to synchronize: cudaErrorAssert: device-side assert triggered

Could you check, if your code runs on the CPU?
I assume F.binary_cross_entropy might throw an error, as it expects probabilities as the model output, while you are passing the raw output as pred_s_p.
Try to use F.binary_cross_entropy_with_logits instead.

The model and data are on GPU. F.binary_cross_entropy_with_logits works but the loss does not change. Since I am applying softmax, the values should be probabilities in the output that’s why I thought I should use F.binary_cross_entropy.

You are applying the softmax on the weights, not the output.
Depending on the distribution of your input you will not get probabilities as the output, which would raise an error as:

RuntimeError: Assertion `x >= 0. && x <= 1.' failed. input value should be between 0~1, but got -1.429985

I see, any idea how I can apply softmax to the output of each conv operation?

Thanks for quick responses btw.

You could just call it directly of the output of F.conv2d:

return self.softmax(F.conv2d(s, self.weight, padding=1))

Doesn’t this apply the softmax over all the pixels in the output image? I need to apply it to each conv operation. Let’s say we have a kernel with size of [3, 3] and image [10, 10].

The layer I want should do:

softmax(image[0:3, 0:3] * kernel), softmax(image[0:3, 1:4] * kernel) ... 
softmax(image[1:4, 0:3] * kernel), softmax(image[1:4, 1:4] * kernel) ...  

Does this makes sense?