I am trying to implement the batch normalization layer and I am getting an error when trying to train using this layer.
The forward pass is correct, I am assuming because I tried to use a model containing the layer to get inference and it worked but for the backward I am having no idea what I should do exactly.
def forward(ctx, input, alpha, betha, batch_size):
ctx.save_for_backward(input, alpha, betha)
input = input*alpha + betha
def backward(ctx, grad_output):
#still not finished
input, alpha, betha = ctx.saved_tensors
grad_input = None
grad_betha = None
grad_alpha = None
grad_alpha = grad_output.mul(input)
grad_alpha = grad_alpha.sum(axis=(0,1,2))
grad_bias = grad_output.sum(axis=(0,1,2))
return grad_input, grad_alpha, grad_betha
So, I do not know whether what I am doing is correct or not but the first error that I am facing right now is saying that 4 parameters were expected to be returned but I am returning 3.
Make sure to read the note on implementing custom Functions here.
In this case, your backward needs to return as many things as the forward had inputs. In your case, no gradient is needed for batch_size as I assume it’s a python number so you should return
None for it in the backward.
Also note that you can avoid loops like this:
for kk in range(batch_size):
part1[kk] = part1[kk]*(alpha)
By using broadcasting:
part1 = part1 * alpha
Also this won’t change part1 inplace (just change the Tensor the python object points to). So you won’t need the clone.
Thanks for all the advices.
Actually, I did add the None before and another problem popped up so I said maybe I tried to solve the first one by inducing another.
Actually for the multip operations, Indeed, I know that but I spent too much time trying to solve the problem so I became doubtful of everything so I changed them to that format anyway the problem is still there then.
The error that I got when I did that was actually
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 32, 32]], which is output 0 of SelectBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
When you enable anomaly detection as suggested, what is the forward function that is to blame?
This is the thing. The suggested action did not help at all. same output
It should add a second stack trace above the one showing the error. It doesn’t?
yes, it does not it just added a sentence saying that the error is in one of the operations in the trace above, Good luck! ==> nothing new
Do you hide the python warnings?
Also if you’re using notebook, you’ll have to use a nightly build to be able to see this warning as support for them was added recently
As a matter of fact, I am not hiding the warnings but I will try to upgrade for a newer version maybe (if one exists).