How to avoid overflow computing the graient of pow?

Hi everyone,

I am trying to implement a function that takes a 4-D Tensor as input and raises its absolute value to a power alpha preserving the sign. The trick is that alpha is a trainable parameter but when the gradient is computed w.r.t. alpha, I get nan which is probably due to overflow.

I have tried using autograd.Function and nn.Module but no luck in fixing the issue so far. Here is my code with autograd.Function:

a = torch.tensor(2.,  requires_grad=True)
class Power(torch.autograd.Function):
  @staticmethod
  def forward(ctx, x, alpha):
    result = x.sign() * torch.abs(x) ** alpha
    ctx.save_for_backward(result)
    return result
  

  @staticmethod
  def backward(ctx, grad_output):
    result, = ctx.saved_tensors
    return result*torch.log(result.abs()+1e-6), None 


power = Power.apply
x = torch.randn(N, D_in, device='cpu', dtype=torch.float)
out = torch.sum(power(x, a))
out.backward()
out.grad # none here

and using nn.Module:

class Power(nn.Module):
  def __init__(self, alpha=2.):
    super(Power, self).__init__()
    self.alpha = nn.Parameter(torch.tensor(alpha))
  
  def forward(self, x):
    return x.sign()*torch.abs(x)**self.alpha

Is there any way to properly fix the function to avoid overflow / nan's? Thanks in advance.

Hi,

In your custom Function, since you compute the gradient wrt alpha, the not-None result should be second one no?
To ensure the problem is overflow, I would add some prints to find exactly which op does that in your custom Function. To detect nan in aTensor, you can do t.ne(t).any().

Hi, thanks for your response! In general, it should the output order in backward follow the order of inputs? If in forward I had forward(ctx, alpha, x) then backward should return grad_wrt_alpha, grad_x in that order?

yes the order follows the order of the arguments of the forward.