Costumized batchnorm, backward function problem

Hello Everyone.
I am trying to implement the batch normalization layer and I am getting an error when trying to train using this layer.
The forward pass is correct, I am assuming because I tried to use a model containing the layer to get inference and it worked but for the backward I am having no idea what I should do exactly.

class NormGrad(torch.autograd.Function):
    def forward(ctx, input, alpha, betha, batch_size):
        ctx.save_for_backward(input, alpha, betha)
        input = input*alpha + betha
        return input

    def backward(ctx, grad_output):
        #still not finished
        input, alpha, betha = ctx.saved_tensors
        grad_input = None
        grad_betha = None
        grad_alpha = None
        grad_input= grad_output*alpha
        grad_alpha = grad_output.mul(input)
        grad_alpha = grad_alpha.sum(axis=(0,1,2))
        grad_bias = grad_output.sum(axis=(0,1,2))
        return grad_input, grad_alpha, grad_betha

So, I do not know whether what I am doing is correct or not but the first error that I am facing right now is saying that 4 parameters were expected to be returned but I am returning 3.


Make sure to read the note on implementing custom Functions here.
In this case, your backward needs to return as many things as the forward had inputs. In your case, no gradient is needed for batch_size as I assume it’s a python number so you should return None for it in the backward.

Also note that you can avoid loops like this:

        for kk in range(batch_size):
            part1[kk] = part1[kk]*(alpha)

By using broadcasting:

        part1 = part1 * alpha

Also this won’t change part1 inplace (just change the Tensor the python object points to). So you won’t need the clone.

Thanks for all the advices.
Actually, I did add the None before and another problem popped up so I said maybe I tried to solve the first one by inducing another.
Actually for the multip operations, Indeed, I know that but I spent too much time trying to solve the problem so I became doubtful of everything so I changed them to that format anyway the problem is still there then.
The error that I got when I did that was actually RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 32, 32]], which is output 0 of SelectBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

When you enable anomaly detection as suggested, what is the forward function that is to blame?

This is the thing. The suggested action did not help at all. same output

Any other suggestions ?

It should add a second stack trace above the one showing the error. It doesn’t?

yes, it does not :frowning: it just added a sentence saying that the error is in one of the operations in the trace above, Good luck! ==> nothing new

Do you hide the python warnings?
Also if you’re using notebook, you’ll have to use a nightly build to be able to see this warning as support for them was added recently :confused:

As a matter of fact, I am not hiding the warnings but I will try to upgrade for a newer version maybe (if one exists).