Costumized batchnorm, backward function problem

SandPhoenix · September 24, 2020, 1:49pm

Hello Everyone.
I am trying to implement the batch normalization layer and I am getting an error when trying to train using this layer.
The forward pass is correct, I am assuming because I tried to use a model containing the layer to get inference and it worked but for the backward I am having no idea what I should do exactly.

class NormGrad(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input, alpha, betha, batch_size):
        ctx.save_for_backward(input, alpha, betha)
        input = input*alpha + betha
        return input

    @staticmethod
    def backward(ctx, grad_output):
        #still not finished
        input, alpha, betha = ctx.saved_tensors
        grad_input = None
        grad_betha = None
        grad_alpha = None
        #print(grad_input.size())
        grad_input= grad_output*alpha
        grad_alpha = grad_output.mul(input)
        grad_alpha = grad_alpha.sum(axis=(0,1,2))
        grad_bias = grad_output.sum(axis=(0,1,2))
        return grad_input, grad_alpha, grad_betha

So, I do not know whether what I am doing is correct or not but the first error that I am facing right now is saying that 4 parameters were expected to be returned but I am returning 3.

albanD · September 24, 2020, 1:56pm

Hi,

Make sure to read the note on implementing custom Functions here.
In this case, your backward needs to return as many things as the forward had inputs. In your case, no gradient is needed for batch_size as I assume it’s a python number so you should return None for it in the backward.

Also note that you can avoid loops like this:

        for kk in range(batch_size):
            part1[kk] = part1[kk]*(alpha)

By using broadcasting:

        part1 = part1 * alpha

Also this won’t change part1 inplace (just change the Tensor the python object points to). So you won’t need the clone.

SandPhoenix · September 24, 2020, 2:01pm

Hello,
Thanks for all the advices.
Actually, I did add the None before and another problem popped up so I said maybe I tried to solve the first one by inducing another.
Actually for the multip operations, Indeed, I know that but I spent too much time trying to solve the problem so I became doubtful of everything so I changed them to that format anyway the problem is still there then.
The error that I got when I did that was actually RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 32, 32]], which is output 0 of SelectBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

albanD · September 24, 2020, 2:08pm

When you enable anomaly detection as suggested, what is the forward function that is to blame?

SandPhoenix · September 24, 2020, 2:11pm

This is the thing. The suggested action did not help at all. same output

SandPhoenix · September 24, 2020, 2:20pm

Any other suggestions ?

albanD · September 24, 2020, 3:20pm

It should add a second stack trace above the one showing the error. It doesn’t?

SandPhoenix · September 24, 2020, 4:17pm

yes, it does not it just added a sentence saying that the error is in one of the operations in the trace above, Good luck! ==> nothing new

albanD · September 24, 2020, 4:55pm

Do you hide the python warnings?
Also if you’re using notebook, you’ll have to use a nightly build to be able to see this warning as support for them was added recently

SandPhoenix · September 24, 2020, 5:37pm

As a matter of fact, I am not hiding the warnings but I will try to upgrade for a newer version maybe (if one exists).