I write a nn.Module with my own layer, say it is just nn.BatchNorm.
I would like to normalize the gradient in the backward pass before feeding it to lower layers (so it have only information about the direction). I would like to normalize only that single layer , not gradient from all layers.
How could I do this without writing own full backward pass and just by using Autograd gradients, which then could be normalized?
I imagine it like that:
class MyBatchNorm(nn.Module):
def __init__(self, in_features):
super(MyBatchNorm,self).__init__()
self.fc1_bn = nn.BatchNorm1d(in_features)
def forward(self, x):
# normalize data by BN
out = self.fc1_bn(x)
return out
def backward(self, grad_output):
grad_input, grad_weight, grad_bias = self.fc1_bn.backward(grad_output)
return F.normalize(grad_input), F.normalize(grad_weight), F.normalize(grad_bias)