# How to get the gradient of loss function twice

here is what I’m trying to implement:

We calculate loss based on `F(X)`, as usual. And we also define “adversarial loss” which is a loss based on `F(X + e)`. `e` is defined as `dF(X)/dX` multiplied by some constant. Both loss and adversarial loss are backpropagated for the total loss.

In tensorflow, this part (getting `dF(X)/dX`) can be coded like below:

Below is my pytorch code:

``````class DocReaderModel(object):
def __init__(self, embedding=None, state_dict=None):
self.train_loss = AverageMeter()
self.embedding = embedding
self.network = DNetwork(opt, embedding)
self.optimizer = optim.SGD(parameters)

def adversarial_loss(self, batch, loss, embedding, y):
loss.backward(retain_graph=True)

network_temp = DNetwork(self.opt, adv_embedding) # This is how to get F(X)
network_temp.training = False
network_temp.cuda()
start, end, _ = network_temp(batch) # This is how to get F(X)
del network_temp # I even deleted this instance.
return F.cross_entropy(start, y) + F.cross_entropy(end, y)

def update(self, batch):
self.network.train()
start, end, pred = self.network(batch)
loss = F.cross_entropy(start, y) + F.cross_entropy(end, y)

loss_total.backward()
self.optimizer.step()
``````

I have few questions:

1. I substituted `tf.stop_gradient` with `grad.detach_()`. Is this correct?

2. I was getting `"RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify` retain_graph=True`when calling backward the first time."` so I added `retain_graph=True` at the `loss.backward.` That specific error went away. However now I’m getting a memory error after few epochs `(RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.cu:58 )`. I suspect I’m unnecessarily retaining graph.

Can someone @albanD let me know pytorch’s best practice on this? Any hint / even short comment will be highly appreciated.

Hi,

1. Yes using `.detach()` is the right way to stop gradients from flowing back.

To get the gradients, you don’t have to make a full backward, you can use `torch.autograd.grad` to get the gradients for specific tensors, here `embedding` for example. You will need to give `create_graph=True` (which implies will set `retain_graph=True`).

Also I am not sure why you set `training=False` on the new net you create?

The last `.backward()` in the `update` function should not have a `retain_graph=True`. If you get an error here, that means that accross two calls to update, some autograd operations are shared (and they shouldn’t).

1 Like