I am attempting to implement the following operation from this paper:

This is what it looks like in code:

```
class FrontEnd(nn.Module):
def __init__(self):
super().__init__()
self.bn = nn.BatchNorm1d(80, affine=False)
self.register_parameter('alpha', torch.nn.Parameter(torch.tensor(0.5)))
def forward(self, x):
bs, im_num, ch, y_dim, x_dim = x.shape
x = x ** torch.sigmoid(self.alpha) # <----- line causing issues
x = x.view(-1, y_dim, x_dim)
x = self.bn(x)
return x.view(bs, im_num, ch, y_dim, x_dim)
```

If I set the `alpha`

parameter to not require grad, everything is fine. If I however make it learnable, loss turns to nan after a single iteration.

When running with anomaly detection, this is the message that I get:

```
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-19-953c2962d9a9> in <module>
10 loss = criterion(outputs, labels)
11 with autograd.detect_anomaly():
---> 12 loss.backward()
13 # print(model.frontend.alpha.grad)
14 optimizer.step()
/opt/conda/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
193 products. Defaults to ``False``.
194 """
--> 195 torch.autograd.backward(self, gradient, retain_graph, create_graph)
196
197 def register_hook(self, hook):
/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
97 Variable._execution_engine.run_backward(
98 tensors, grad_tensors, retain_graph, create_graph,
---> 99 allow_unreachable=True) # allow_unreachable flag
100
101
RuntimeError: Function 'PowBackward1' returned nan values in its 1th output.
```

If I register a backward hook on `frontend`

as follows:

```
def printgradvals(self, grad_input, grad_output):
print(grad_input[0].ne(grad_input[0]).any())
print(grad_output[0].ne(grad_output[0]).any())
print(grad_input[0].abs().mean())
print(grad_output[0].abs().mean())
print(grad_input[0].abs().min())
print(grad_output[0].abs().min())
model.frontend.register_backward_hook(printgradvals)
```

I get the following output:

```
tensor(False, device='cuda:0')
tensor(False, device='cuda:0')
tensor(1.9707e-05, device='cuda:0')
tensor(1.9707e-05, device='cuda:0')
tensor(1.3642e-12, device='cuda:0')
tensor(1.3642e-12, device='cuda:0')
```

The gradients are very small. But why doesnâ€™t input grad differ from output grad? And what could be causing the `nan`

s?

Thank you very much for any help that you can give me on this

EDIT: Just noticed I initialize `alpha`

to 0.5 in the code that I copied - I tried with initializing it to zero, no changes. Getting same results.