I am attempting to implement the following operation from this paper:
This is what it looks like in code:
class FrontEnd(nn.Module):
def __init__(self):
super().__init__()
self.bn = nn.BatchNorm1d(80, affine=False)
self.register_parameter('alpha', torch.nn.Parameter(torch.tensor(0.5)))
def forward(self, x):
bs, im_num, ch, y_dim, x_dim = x.shape
x = x ** torch.sigmoid(self.alpha) # <----- line causing issues
x = x.view(-1, y_dim, x_dim)
x = self.bn(x)
return x.view(bs, im_num, ch, y_dim, x_dim)
If I set the alpha
parameter to not require grad, everything is fine. If I however make it learnable, loss turns to nan after a single iteration.
When running with anomaly detection, this is the message that I get:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-19-953c2962d9a9> in <module>
10 loss = criterion(outputs, labels)
11 with autograd.detect_anomaly():
---> 12 loss.backward()
13 # print(model.frontend.alpha.grad)
14 optimizer.step()
/opt/conda/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
193 products. Defaults to ``False``.
194 """
--> 195 torch.autograd.backward(self, gradient, retain_graph, create_graph)
196
197 def register_hook(self, hook):
/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
97 Variable._execution_engine.run_backward(
98 tensors, grad_tensors, retain_graph, create_graph,
---> 99 allow_unreachable=True) # allow_unreachable flag
100
101
RuntimeError: Function 'PowBackward1' returned nan values in its 1th output.
If I register a backward hook on frontend
as follows:
def printgradvals(self, grad_input, grad_output):
print(grad_input[0].ne(grad_input[0]).any())
print(grad_output[0].ne(grad_output[0]).any())
print(grad_input[0].abs().mean())
print(grad_output[0].abs().mean())
print(grad_input[0].abs().min())
print(grad_output[0].abs().min())
model.frontend.register_backward_hook(printgradvals)
I get the following output:
tensor(False, device='cuda:0')
tensor(False, device='cuda:0')
tensor(1.9707e-05, device='cuda:0')
tensor(1.9707e-05, device='cuda:0')
tensor(1.3642e-12, device='cuda:0')
tensor(1.3642e-12, device='cuda:0')
The gradients are very small. But why doesn’t input grad differ from output grad? And what could be causing the nan
s?
Thank you very much for any help that you can give me on this
EDIT: Just noticed I initialize alpha
to 0.5 in the code that I copied - I tried with initializing it to zero, no changes. Getting same results.