How to create output equivalent to tf.gradients()?

I am trying to implement a loss function in Pytorch which requires me to get the gradient of the logit of a discriminator of a GAN as such:

There is an implementation of this in tensorflow already which uses the tf.gradients function:

   # -----------------------------------------------------------------------------------
    #     JS-Regularizer
    # -----------------------------------------------------------------------------------
    def Discriminator_Regularizer(D1_logits, D1_arg, D2_logits, D2_arg):
        with tf.name_scope('disc_reg'):
            D1 = tf.nn.sigmoid(D1_logits)
            D2 = tf.nn.sigmoid(D2_logits)
            grad_D1_logits = tf.gradients(D1_logits, D1_arg)[0]
            grad_D2_logits = tf.gradients(D2_logits, D2_arg)[0]
            grad_D1_logits_norm = tf.norm( tf.reshape(grad_D1_logits, [BATCH_SIZE//len(DEVICES),-1]) , axis=1)
            grad_D2_logits_norm = tf.norm( tf.reshape(grad_D2_logits, [BATCH_SIZE//len(DEVICES),-1]) , axis=1)
            
            #set keep_dims=True/False such that grad_D_logits_norm.shape == D.shape
            print('grad_D1_logits_norm.shape {} != D1.shape {}'.format(grad_D1_logits_norm.shape, D1.shape))
            print('grad_D2_logits_norm.shape {} != D2.shape {}'.format(grad_D2_logits_norm.shape, D2.shape))
            assert grad_D1_logits_norm.shape == D1.shape
            assert grad_D2_logits_norm.shape == D2.shape
            
            reg_D1 = tf.multiply(tf.square(1.0-D1), tf.square(grad_D1_logits_norm))
            reg_D2 = tf.multiply(tf.square(D2), tf.square(grad_D2_logits_norm))
            
            disc_regularizer = tf.reduce_mean(reg_D1 + reg_D2)
            
     return disc_regularizer

Does anyone know how to do the equivalent using autograd? Is this functionality possible in pytorch? It would be greatly appreciated.

My understanding is that tf.gradients(ys,xs) makes the symbolic derivatives of the sum of ys wrt to x. How is this different from autograd.grad()?

Michael

1 Like

I think you’re looking for autograd.grad. It computes the gradients of output wrt some inputs.

Hi Richard,

Thanks for your response. Yes I figured this would be the implementation choice, but when I do try to use it, I am told:

RuntimeError: grad can be implicitly created only for scalar outputs

Whereas the version in tf works fine. Do you know what this error might be suggesting?

The shape of the (outputs, wrt inputs) is ([512,1], [512,3]). They are both variables. Is this the issue?

To compute gradients (which implies that the output of the function is scalar),

The outputs you pass to autograd.grad should be either of the following:

  • A Variable wrapping a Tensor of size (1,) (something like a scalar)
  • An arbitrary tuple of Variables. In this case, you should specify a grad_output= for autograd.grad that has the same shape as the arbitrary tuple of Variables.
1 Like

if you have found the answer to your solution can you share the code please?

thank you !

An example1:

from torch.autograd import Variable
xx = Variable(torch.rand(1), requires_grad=True)
cc = 2*xx
torch.autograd.grad(cc, xx)

example 2:

xx = Variable(torch.rand(1), requires_grad=True)
cc = 2*xx
gg =3*cc
torch.autograd.grad(gg, xx)
2 Likes

Would you please explain the second case? For example here.

I was also looking for a solution to the problem of JS regulation for PyTorch. I was inspired by the regularization method of the WGAN-GP. Here is the solution that I propose to you.

def discriminator_regularizer(critic, D1_args, D2_args):
    '''
    JS-Regularizer

    A Methode that regularize the gradient when discriminate real and fake. 
    This methode was proposed to deal with the problem of the choice of the 
    careful choice of architecture, paremeter intializaton and selction 
    of hyperparameters.
    
    GAN is highly sensitive to the choice of the latters. 
    According to "Stabilizing Training of Generative Adversarial Networks 
    through Regularization", Roth and al., This fragility is due to the mismathch or 
    non-overlapping support between the model distribution and the data distribution.
    :param critic : Discriminator network,
    :param D1_args : real value
    :param D2_args : fake value
    '''
    BATCH_SIZE, *others = D1_args.shape
    DEVICE              = D1_args.device

    D1_args     = Variable(D1_args, requires_grad=True) 
    D2_args     = Variable(D2_args, requires_grad=True)
    D1_logits, D2_logits = critic(D1_args), critic(D2_args)
    D1, D2 = torch.sigmoid(D1_logits), torch.sigmoid(D2_logits)
    
    grad_D1_logits = torch.autograd.grad(outputs=D1_logits, inputs=D1_args,
            create_graph=True, retain_graph=True,
            grad_outputs=torch.ones(D1_logits.size()).to(DEVICE))[0]

    grad_D2_logits = torch.autograd.grad(outputs=D2_logits, inputs=D2_args,
            create_graph=True, retain_graph=True,
            grad_outputs=torch.ones(D2_logits.size()).to(DEVICE))[0]

    grad_D1_logits_norm = torch.norm(torch.reshape(grad_D1_logits,(BATCH_SIZE,-1)), 
        dim=-1, keepdim=True)

    grad_D2_logits_norm = torch.norm(torch.reshape(grad_D2_logits,(BATCH_SIZE,-1)), 
        dim=-1, keepdim=True)

    assert grad_D1_logits_norm.shape == D1.shape
    assert grad_D2_logits_norm.shape == D2.shape

    reg_D1 = torch.multiply(torch.square(1. - D1), torch.square(grad_D1_logits_norm))
    reg_D2 = torch.multiply(torch.square(D2), torch.square(grad_D2_logits_norm))
    discriminator_regularizer =  torch.sum(reg_D1 + reg_D2).mean()
    return discriminator_regularizer