Custom function gradcheck

agadetsky · December 11, 2017, 3:09am

I have some problems with pytorch gradcheck.

class KL(Function):
    # vectorized
    # input is batch_size x dim
    # Note that both forward and backward are @staticmethods
    @staticmethod
    def forward(ctx, logits, beta=1):
        log_z = 94.358391688507595 # partition function for dim=300
        ctx.save_for_backward(logits)
        ctx.beta = beta
        
        p = logits.sigmoid()
        # more computationally stable entropy
        entropy = (-logits * (p - 1) + F.softplus(-logits).data)
        entropy[entropy != entropy] = 0 # when probs are ones or zeros -> entropy is zero
        a = p[:, :-1]
        b = p[:, 1:]
        binary_potentials = (a * (1 - b) + (1 - a) * b)
        out = entropy.neg().sum(dim=1) + log_z
        out = out + beta * binary_potentials.sum(dim=1)
        out = out.sum(dim=0, keepdim=True)
        
        return out

    @staticmethod
    def backward(ctx, grad_output):
        logits = ctx.saved_variables[0]
        beta = ctx.beta
        
        logits_exp = torch.exp(logits)
        term_1 = (logits_exp * logits) / (logits_exp + 1)**2
        
        p = logits.sigmoid()
        prev = torch.zeros(*p.size())
        prev[:, 1:] = p.data[:, :-1]
        prev = Variable(prev)
        nxt = torch.zeros(*p.size())
        nxt[:, :-1] = p.data[:, 1:]
        nxt = Variable(nxt)
        if p.is_cuda:
            prev = prev.cuda()
            nxt = nxt.cuda()
        mul_term = ( 2 - 2 * prev - 2 * nxt)
        mul_term[:, 0] = mul_term[:, 0] - 1
        mul_term[:, -1] = mul_term[:, -1] - 1
        term_2 = beta * p * (1 - p) * mul_term

        grad_input = term_1 + term_2
        
        grad_input[grad_input.data != grad_input.data] = 0
        

        return grad_output * grad_input, None

I have custom function showed above, and gradcheck fails:

logits = Variable(torch.randn(5, 300), requires_grad=True)
inpt = (logits, 1)
gradcheck(KL.apply, inpt, eps=1e-6, atol=1e-4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 181, in gradcheck
    return fail_test('for output no. %d,\n numerical:%s\nanalytical:%s\n' % (j, numerical, analytical))
  File "/usr/local/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 166, in fail_test
    raise RuntimeError(msg)
RuntimeError: for output no. 0,
 numerical:(
    0
    0
    0
  ⋮
    0
    0
    0
[torch.FloatTensor of size 1500x1]
,)
analytical:(
-0.1566
-0.0927
 0.1289
   ⋮
-0.0009
 0.2219
-0.2261
[torch.FloatTensor of size 1500x1]
,)

But when I check function and its gradients with scipy.optimize.check_grad all is ok and error is less than 1e-6. As I see, the problem is with finite differences, but I can’t understand why. Forward pass seems legit.

Pytorch versions: 0.3.0.post4, 0.2.0.post4’

agadetsky · December 11, 2017, 5:02am

Some problems with shape in application of numerical and analytical formulas, because grad shape must be 5x300, not 1500x1.

albanD · December 11, 2017, 10:42am

Hi,

The matrix that is shown is the Jacbian, so it’s size is always number_of_inputs x number_of_outputs so it is the right size in your case.

You should be careful that and eps of 1e-6 with FloatTensors often lead to loss of precision. I think you want to try to either use DoubleTensor (best idea usually) or increase eps.